<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Conference and Labs of the Evaluation Forum, September</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>UMUTeam at eRisk@CLEF 2024: Fine-Tuning Transformer Models with Sentiment Features for Early Detection and Severity Measurement of Eating Disorders</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ronghao Pan</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>José Antonio García-Díaz</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tomás Bernal-Beltrán</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rafael Valencia-García</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Facultad de Informática, Universidad de Murcia, Campus de Espinardo</institution>
          ,
          <addr-line>30100</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>0</volume>
      <fpage>9</fpage>
      <lpage>12</lpage>
      <abstract>
        <p>This paper describes the participation of the UMUTeam in the eRisk shared task organized at CLEF 2024. We have addressed the Task 2 and 3 which are related to early detection of signs of anorexia and measuring the severity of eating disorder signs. For this purpose, several approaches were used, including the fine-tuning of a sentence transformer model for measuring the severity of eating disorder signs and the fine-tuning of pre-trained Transformers-based language models with sentiment features for detecting anorexia signs. For Task 2, we have reached the 5th position in the decision-based evaluation ranking and raking based evaluation ranking. As for Task 3, we have obtained 5th place, out of 5 participants, however, our model has a more balanced overall accuracy and performance across most metrics.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Mental disorders</kwd>
        <kwd>Deep learning</kwd>
        <kwd>Natural Language Processing</kwd>
        <kwd>Fine-tuning</kwd>
        <kwd>Transformers</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Mental health is the state of a person’s psychological and emotional well-being. It includes the ability to
manage emotions, cope with stress, maintain satisfying relationships, work productively, and contribute
to the community. It can be influenced by many factors, including genetics, life experiences, social
environment, stress, and brain chemistry [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. In recent years, there has been an increase in mental
illness, an alarming phenomenon that has captured the attention of public health oficials, experts,
researchers, and governments around the world. According to a recent report by the World Health
Organization (WHO), one in eight people in the world sufers from a mental illness 1. Therefore, there is
an urgent need to address the factors contributing to the increase in these diseases and to implement
efective strategies to improve the mental and physical health of the world’s population.
      </p>
      <p>
        Several studies have shown that excessive use of social networking site can have negative efects
on mental health, specially in adolescents and young adults, making it a topic of growing interest
and concern in research and public health [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. This relationship highlights the importance of early
detection of mental health symptoms in order to efectively intervene and prevent these problems from
worsening.
      </p>
      <p>
        For this reason, the interest in the detection and identification of mental disorders in social network
streams has grown in recent years, driven by the use of advanced Natural Language Processing (NLP)
technologies, due to the increasing prevalence of mental health problems and their relationship with
digital platforms [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. In addition, a number of mental health-related tasks have emerged in important
evaluation campaigns, such MentalriskES [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] of Iberian Languages Evaluation Forum (IberLEF) and
eRisk [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] of Conference and Labs of the Evaluation Forum (CLEF).
      </p>
      <p>The eRisk Lab focuses on the development of assessment methodologies and metrics for the early
detection of risks on the Internet, specially related to health and safety issues. The initiative was
initiated at CLEF in Dublin in 2017, and has already hosted eight editions through 2024. Throughout
these editions, the Lab has presented numerous collections and models that address diferent application
domains. Previous editions have explored topics such as depression, eating disorders, gambling, and
self-harm detection. Lab tasks include early warning and severity assessment challenges, which involve
automated analysis of temporal text streams to predict specific risks and compute detailed symptom
estimates from users’ writings.</p>
      <p>
        The eRisk@CLEF 2024 [
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ] focuses on the early detection of signs of anorexia, the search for
symptoms of depression, and measuring the severity of signs of eating disorders. This shared task was
defined using the test collection, and evaluation metrics were proposed.
      </p>
      <p>This paper presents the participation of the UMUTeam in tasks related to the early detection of signs
of anorexia and measuring the severity of signs of eating disorders. For this purpose, several approaches
have been employed, including fine-tuning of a sentence transformers model to measure the severity of
the signs of eating disorders and fine-tuning of the pre-trained language models based on Transformers
with sentiment features for the detection of signs of anorexia. The rest of the paper is organized as
follows. Section 2 presents the task and the provided dataset. In Section 3, the methodology of our
proposed system for addressing each task is described. Secondly, Section 4 shows the results obtained,
and a discussion of them is presented. Finally, Section 5 concludes the paper with some conclusions
and perspectives for future work.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Task description</title>
      <p>
        This edition of eRisk focuses on detecting symptoms of depression, signs of anorexia, and the severity
of symptoms associated with eating disorders through various datasets and challenges involving
automated analysis of temporal text streams to predict specific problems and compute detailed symptoms
estimations based on user writings. Thus, this shared task is divided into three tasks:
• Task 1: Search for symptoms of depression. This task is a continuation of eRisk 2023’s Task 1,
involves ranking sentences from user writing based on their relevance to symptoms of depression
outlined in the BDI questionnaire.
• Task 2: Early detection of signs of anorexia. This task is a continuation of eRisk 2018’s T2 and
2019’s T1 tasks, focuses on early detection of signs of anorexia. In this case, we are tasked with
sequentially processing pieces of evidence to detect early signs of anorexia as early as possible,
primarily using Text Mining solutions on social network texts. The test collection follows the
format of the collection described in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and comprises writings of social media users, categorized
into individuals with anorexia and control users.
• Task 3: Measuring the severity of the signs of eating disorders. This task involves estimating
the level of features associated with an eating disorder diagnosis from a history of user posts.
In this task, participants are given a history of each user’s posts and are asked to complete a
standard eating disorder questionnaire based on the clues found in the posts. The questionnaires
are derived from the Eating Disorder Examination Questionnaire (EDE-Q), which is a 28-item
self-report questionnaire adapted from the Eating Disorder Examination (EDE) semi-structured
interview, and only questions 1-12 and 19-28 are used.
      </p>
      <p>In this edition, we participated in Task 2 and other tasks. Table 1 shows the distribution of the training
dataset. We can see that the table shows various measures of the dataset, such as the number of topics,
the number of submissions (posts and comments), the average number of submissions per topic, the
average number of days from first to last submission, and the average number of words per submission.</p>
      <p>For Task 3, which is a continuation of ERISK 2022 and 2023 Task 3, we used only the 2023 dataset,
which has a total of 404,404 text questions.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <p>This section details the processes, techniques, and tools used for Task 2 and Task 3.
3.1. Task 2
Figure 1 shows the general architecture of our approach for Task 2. Briefly, first, we performed a
preprocessing by selecting the user messages that are most relevant for anorexia identification. Second,
we divided the dataset into two subsets with an 80-20 ratio: training, a subset of data that is used to train
the model, and validation, a subset of data separated from the training set that is used to evaluate the
model’s performance during training. Third, the last hidden state of the pre-trained language models is
used to obtain the text representation, and then a sentiment analysis model is used to obtain sentiment
features from the texts. Finally, the last hidden state and the logits from the sentiment analysis model
are concatenated to serve as input to a neural network, which is the classification head. This network
includes a normalization layer (LayerNorm), a dropout layer, linear layers with Tanh as activation
function, and a linear layer at the end to obtain the anorexia identification model.</p>
      <p>For this task, we used only the 2019 dataset. From Table 1, we can see that at the post and comment
level, there are a total of 253,752 posts, of which 24,874 are anorexia type and 228,878 are control type,
indicating a significant imbalance. Therefore, we performed a preprocessing to prevent the model from
always learning to predict the majority class and to reduce the noise in the dataset.</p>
      <p>
        Sentiment analysis involves the use of NLP techniques to identify and categorize opinions expressed
in a text, specifically to determine whether the sentiment is positive, negative, or neutral. For example,
[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] shows the relationship between emotions and mental illness, as well as the importance of automatic
recognition in the health field. In the context of anorexia, this analysis can help identify patterns in
language that may indicate the presence of this disease. In this case, we used only negative texts from
users with anorexia and positive and neutral texts from control users.
      </p>
      <p>To address this task, we followed a supervised learning approach. To train our model, we used
the two datasets obtained after the selection process. It is worth mentioning that the organizers only
provided training data, so we selected a custom split for validation. The customized validation split
is created using stratified sampling, in order to keep the balance between labels. Table 2 shows the
distribution of the processed data set in the training and validation sets. We can see that we end
up with a total of 4,656 texts representative of users sufering from anorexia and 11,309 of those not
sufering from anorexia in the training set. In the validation set, we have a total of 1,164 anorexia type
texts and 2,828 that are not related to anorexia. Moreover, we also deleted all mentions, references to
URLs and hashtags from the texts, and identified and removed sequences such as “amp;format=png”,
“amp;s=7b66887b445eb00d7d842b15e15e15e15f4759f3deb03d”, among others.</p>
      <p>
        For this task, we evaluated the BERT [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], RoBERTa [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], and RoBERTa-large [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] models for text
representation and the Cardif NLP TweetEval model for sentiment analysis of text.
      </p>
      <p>
        BERT [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] is a language model developed by Google in 2018 based on the Transformer architecture, a
neural network designed to process data streams such as text or audio. BERT was pre-trained on large
amounts of text, allowing it to capture general linguistic knowledge. This pre-trained model can then
be tuned for specific natural language processing tasks such as sentiment analysis, machine translation,
or question answering.
      </p>
      <p>
        RoBERTa [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] is an extension of Facebook AI’s BERT language model. It focuses on large-scale
training, eliminating specific tasks and using more robust learning dynamics. These improvements
make RoBERTa more efective and accurate than BERT at a variety of natural language processing tasks.
      </p>
      <p>
        RoBERTa-large [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] is a larger and more powerful version of the RoBERTa language model. Like
RoBERTa, it is based on the BERT architecture, but has more parameters and processing power.
RoBERTalarge is trained on an even larger dataset for a longer period of time, allowing it to capture more complex
and general linguistic patterns.
      </p>
      <p>
        The Cardif NLP TweetEval model [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] is a RoBERTa-based model specifically trained to perform
sentiment analysis tasks on Twitter tweets. It has been trained on approximately 58 million tweets and
tuned for sentiment analysis using the TweetEval benchmark dataset. Finally, for early detection, we
have evaluated a strategy based on making a decision when the number of signs of anorexia in a user’s
messages exceeds a certain threshold.
3.2. Task 3
This task involves estimating traits associated with an eating disorder diagnosis from a set of user
posts. The organizers have provided a user’s posting history along with a standardized eating disorder
questionnaire. Thus, the primary goal of this task is to predict potential responses to the questionnaire
based on the user’s posting history.
      </p>
      <p>The questionnaire in question is the Eating Disorder Examination Questionnaire (EDE-Q), a 28-item
self-report questionnaire derived from the semi-structured interview known as the Eating Disorder
Examination (EDE). In this case, our goal is to predict responses to questions 1-12 and 19-28. The
dataset consists of 28 instances of users’ posting history along with their corresponding responses to
the EDE-Q questionnaire.</p>
      <p>For this task, we adopted a fine-tuning approach using a sentence transformer model that uses textual
similarity to measure the similarity between potential responses (user thread text) and each question in
the EDE-Q. To achieve this, we processed the user text, mapped it to the 22 questions, and assigned
a score based on the user’s responses to the questionnaire. To derive a scale-based score, we defined
specific intervals for each possible answer within the questionnaire.</p>
      <p>0. NO DAYS / not at all (0 to 0.1)
1. 1-5 DAYS / slightly (0.1 to 0.2)
2. 6-12 DAYS / slightly (0.2 to 0.3)
3. 13-15 DAYS / moderately (0.3 to 0.4)
4. 16-22 DAYS / moderately (0.4 to 0.5)
5. 23-27 DAYS / markedly (0.5 to 0.7)
6. EVERY DAY / markedly (0.7 to 1.0)</p>
      <p>Thus, within the training set, each text is associated with specific questions and assigned a score,
which is a randomly generated value that falls within the appropriate interval based on the user’s
response. We also chose a custom 80-20 split for validation. The training set contains 323,523
textquestion relations along with their respective scores, while the validation set contains 80,881 such
relations.</p>
      <p>For this task, the dataset was first processed by removing contractions, mentions, hashtags, URLs,
and AMP expressions, and extracting emoji features using the emoji Python library. Second, we fit
the
multi-qa-mpnet-base-dot-v1footnotehttps://huggingface.co/sentence-transformers/multi-qa-mpnetbase-dot-v1 and
sentence-transformers/all-MiniLM-L6-v2footnoteurlhttps://huggingface.co/sentencetransformers/all-MiniLM-L6-v2 models with cosine similarity as the loss function, 10 epochs, and 1000
warm-up steps. multi-qa-mpnet-base-dot-v1 is based on the MPNet (Multilingual Pretrained BERT)
architecture, which is based on the BERT (Bidirectional Encoder Representations from Transformers)
model. sentence-transformers/all-MiniLM-L6-v2 is a kind of all-round model tuned for many use cases
and trained on a large and diverse dataset of over 1 billion training pairs.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <p>4.1. Task 2
This section describes the systems submitted by our team in each run and shows the results obtained in
each task.</p>
      <p>• Run 0: This run consists of running a classification model obtained through the fine-tuning
RoBERTa-base with sentiment feature within which the set of posts used has been preprocessed.
The threshold used in early strategy is 10, i.e., the decision is made when more than 10 posts are
identified as identifying the user’s anorexia type.
• Run 1: This run uses the same classification model as Run 0, but uses 15 as threshold of the early
detection strategy.
• Run 2: In this run, we used the fine-tuned DeBERTa as a classification model and a threshold of
10 for early detection strategy.
• Run 3: This run uses the same classification model as Run 2, but uses a threshold of 15 for early
detection strategy.</p>
      <p>• Run 4: This run has the same structure as Run 2, but changing the brave strategy threshold to 20.</p>
      <p>Table 4 shows the results of the decision-based evaluation of Task 2, specifically the precision, recall,
and F1 score over the five runs. Accuracy ranges from 0.14 to 0.16, indicating a low variability in the
model’s ability to correctly identify relevant instances. Recall is very high across all runs, between 0.98
and 0.99, demonstrating the model’s efectiveness in capturing almost all relevant instances. The F1
score, which balances precision and recall, shows a slight improvement from 0.25 in run 0 to 0.27 in run
4. The ERDE5 and ERDE50 metrics, which measure early risk detection errors, remain relatively stable,
indicating consistent early detection performance across all runs. Latency, which reflects the time it
takes to make a correct prediction, increases from 18.0 in Run 0 to 35.5 in Run 4. Speed, which reflects
the speed of processing, decreases slightly to a low of 0.87 in Run 4.</p>
      <p>Overall, Run 4 achieves the highest accuracy and F1 score at the cost of higher latency, while Runs 0
and 2 ofer lower latency at slightly lower accuracy and F1 score. With this result, we ranked fifth in
decision-based evaluation.
4.2. Task 3
For this task, we presented two runs based on fine-tuning a pre-trained sentence transformer model,
that uses textual similarity to measure the similarity between potential responses (user thread text) and
each question in the EDE-Q.</p>
      <p>NDCG@10</p>
      <p>NDCG@100
• Run 0: This run consists of using the multi-qa-mpnet-base-dot-v1 fine-tuned model as a system
model to identify the similarity between the EDE-Q question and the user posts. For each user
post, it is fed into the system and the system calculates the degree of similarity between the
EDE-Q question and the post. Based on the score obtained by the system, a possible answer to
the question is assigned within the intervals defined in section X. Once all the contributions have
passed through the system, the most repeated answer for each question is assigned as the final
answer.
• Run 1: This run uses the same approach as Run 0, but uses sentence-transformers/all-MiniLM-L6-v2
ifne-tuned model as a system model.</p>
      <p>Table 6 shows the results obtained in the evaluation of Task 3, evaluated according to diferent
metrics: MAE (Mean Absolute Error), MZOE (Mean Zero-One Error), MAE_macro, GED (Global Eating
Disorder Score), RS (Restraint Subscale), ECS (Eating Concern Subscale), SCS (Shape Concern Subscale),
and WCS (Weight Concern Subscale).</p>
      <p>First, we looked at the Mean Absolute Error (MAE), which measures the average size of the errors in
the predictions without considering their direction. The Run 1 achieved an MAE of 2.227, while the Run
0 achieved an MAE of 2.366. This indicates that Run 1 had a higher overall accuracy in its predictions.</p>
      <p>The MZOE metric shows the average of the errors in terms of binary hits and misses. Run 0 had an
MZOE of 0.798 compared to 0.859 for Run 1. This means that Run 0 made fewer errors and was more
accurate in correctly classifying cases.</p>
      <p>As for MAE_macro, which evaluates the mean absolute error balanced across classes, Run 1 performed
better with a value of 2.286 compared to 2.833 for Run 0. This result indicates that Run 1 achieved a
more balanced performance between the diferent data categories, which is crucial in situations where
all classes are equally important.</p>
      <p>The GED measures the overall accuracy of the model in predicting eating disorders. The Run 0 had
a GED of 3.261, while the Run 1 had a GED of 3.286. Although the diference is small, Run 0 showed
slightly better performance on this overall measure.</p>
      <p>For the RS, which measures accuracy in predicting dietary restraint behavior, both runs showed very
similar results, with Run 1 scoring an RS of 3.269 and Run 0 scoring 3.285. This parity indicates that
both runs are comparable in terms of accuracy on this specific subscale.</p>
      <p>On the ECS, Run 0 showed better performance with an ECS of 2.659 compared to 2.911 for Run 1.
This result suggests that Run 0 was more efective at capturing specific food concerns.</p>
      <p>On the SCS, Run 1 performed better with an SCS of 2.560 compared to 2.771 for Run 0. This data
suggests that Run 1 was more accurate in predicting body shape concerns.</p>
      <p>Finally, on the WCS, Run 1 also outperformed Run 0 with a WCS of 2.026 compared to 2.218. This
demonstrates a better ability of Run 1 to predict weight concern.</p>
      <p>In summary, although Run 1 showed better overall accuracy and more balanced performance on
most metrics, Run 0 excelled in specific aspects such as MZOE, GED, and ECS.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>This paper summarizes UMUTeam’s participation in the eRisk collaborative task of the 2024 edition
of CLEF. The eRisk Lab focuses on the development of assessment methods and metrics for the early
detection of risks on the Internet, especially related to health and safety issues. In this edition, the focus
is on detecting symptoms of depression, early detection of signs of anorexia, and measuring the severity
of signs of eating disorders in three related subtasks.</p>
      <p>In this shared task, we have focused on Task 2 and Task 3, which are related to early detection of signs
of anorexia and measuring the severity of eating disorder signs. For this purpose, several approaches
were used, including the fine-tuning of a sentence transformer model for measuring the severity of
eating disorder signs and the fine-tuning of pre-trained Transformers-based language models with
sentiment features for detecting anorexia signs.</p>
      <p>In Task 2, we present 5 runs based on diferent settings, using diferent fine-tuned models as the
classification model for the system and diferent thresholds for the early detection strategy. We ranked
iffth in the decision-based evaluation, and run 4 achieved the highest accuracy and F1 score at the cost
of higher latency, while runs 0 and 2 ofer lower latency with slightly lower accuracy and F1 score. For
the decision-based evaluation, we obtained the top 5 results. In this case, all five runs are identical:
P@10 is consistently 0.20, NDCG@10 is 0.12, and NDCG@100 is 0.14.</p>
      <p>From the results obtained, we can see that the sheer number of comments may not be enough; the
context and severity of the comments are also important. We also found that removing certain negative
comments from users labeled as “control” runs the risk of the model not learning to properly distinguish
between negative comments that are normal and those that are indicative of a mental disorder, which
could degrade the performance of the system.</p>
      <p>In Task 3, we present two runs based on fine-tuning a pre-trained sentence transformer model
that uses textual similarity to measure the similarity between possible answers (user thread text) and
each question in the EDE-Q. In this case, run 1, which is based on a fine-tuned model of
sentencetransformers/all-MiniLM-L6-v2, has the best result in overall accuracy and a more balanced performance
on most metrics.</p>
      <p>
        As a future line, we suggest adding the user’s previous context as an input to improve performance,
and not removing all negative comments from users marked as "control", to avoid that the model
does not learn to correctly distinguish between negative comments that are normal and those that
are indicative of a mental disorder. Furthermore, it is important to examine the relationship between
indicators of mental illness and hate speech [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], the use of humor [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], and the demographic and
psychographic characteristics of the message authors [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ].
      </p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This work is part of the research projects LaTe4PoliticES (PID2022-138099OB-I00) funded by
MICIU/AEI/10.13039/501100011033 and the European Regional Development Fund (ERDF)-a way of making
Europe and LT-SWM (TED2021-131167B-I00) funded by MICIU/AEI/10.13039/ 501100011033 and by the
European Union NextGenerationEU/PRTR, and ”Services based on language technologies for
political microtargeting“ (22252/PDC/23) funded by the Autonomous Community of the Region of Murcia
through the Regional Support Program for the Transfer and Valorization of Knowledge and Scientific
Entrepreneurship of the Seneca Foundation, Science and Technology Agency of the Region of Murcia.
Mr. Ronghao Pan is supported by the Programa Investigo grant, funded by the Region of Murcia, the
Spanish Ministry of Labour and Social Economy and the European Union - NextGenerationEU under
the “Plan de Recuperación, Transformación y Resiliencia (PRTR)”.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Dattani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Rodés-Guirao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Ritchie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Roser</surname>
          </string-name>
          , Mental health,
          <source>Our world in data (</source>
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>R.</given-names>
            <surname>Sacco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Camilleri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Eberhardt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Umla-Runge</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Newbury-Birch</surname>
          </string-name>
          ,
          <article-title>A systematic review and meta-analysis on the prevalence of mental disorders among children and adolescents in Europe, European Child</article-title>
          &amp; Adolescent
          <string-name>
            <surname>Psychiatry</surname>
          </string-name>
          (
          <year>2022</year>
          ).
          <source>doi:10.1007/s00787-022-02131-2.</source>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>R. A.</given-names>
            <surname>Calvo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. N.</given-names>
            <surname>Milne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Hussain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Christensen</surname>
          </string-name>
          ,
          <article-title>Natural language processing in mental health applications using non-clinical texts†</article-title>
          ,
          <source>Natural Language Engineering</source>
          <volume>23</volume>
          (
          <year>2017</year>
          )
          <fpage>649</fpage>
          -
          <lpage>685</lpage>
          . URL: https://api.semanticscholar.org/CorpusID:17828909.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>A. M. M.-R. y Adrián</surname>
          </string-name>
          Moreno
          <article-title>-Muñoz y Flor Miriam Plaza-del-Arco y María Dolores MolinaGonzález y Maria Teresa Martín-Valdivia y Luis Alfonso Ureña-López y Arturo Montejo-Raéz, Overview of MentalRiskES at IberLEF 2023: Early Detection of Mental Disorders Risk in Spanish</article-title>
          ,
          <source>Procesamiento del Lenguaje Natural</source>
          <volume>71</volume>
          (
          <year>2023</year>
          )
          <fpage>329</fpage>
          -
          <lpage>350</lpage>
          . URL: http://journal.sepln.org/sepln/ojs/ ojs/index.php/pln/article/view/6564.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Parapar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Martín-Rodilla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. E.</given-names>
            <surname>Losada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Crestani</surname>
          </string-name>
          , Overview of eRisk 2023:
          <article-title>Early Risk Prediction on the Internet</article-title>
          , in: A.
          <string-name>
            <surname>Arampatzis</surname>
            , E. Kanoulas,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Tsikrika</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Vrochidis</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Giachanou</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Aliannejadi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Vlachos</surname>
          </string-name>
          , G. Faggioli, N. Ferro (Eds.),
          <source>Experimental IR Meets Multilinguality, Multimodality, and Interaction</source>
          , Springer Nature Switzerland, Cham,
          <year>2023</year>
          , pp.
          <fpage>294</fpage>
          -
          <lpage>315</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.</given-names>
            <surname>Parapar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Martín Rodilla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Losada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Crestani</surname>
          </string-name>
          , Overview of eRisk 2024:
          <article-title>Early Risk Prediction on the Internet, in: Experimental IR Meets Multilinguality, Multimodality, and Interaction</article-title>
          .
          <source>15th International Conference of the CLEF Association, CLEF</source>
          <year>2024</year>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J.</given-names>
            <surname>Parapar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Martín Rodilla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Losada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Crestani</surname>
          </string-name>
          , Overview of eRisk 2024:
          <article-title>Early Risk Prediction on the Internet (Extended Overview)</article-title>
          ,
          <source>in: Working Notes of the Conference and Labs of the Evaluation Forum CLEF</source>
          <year>2024</year>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>D. E.</given-names>
            <surname>Losada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Crestani</surname>
          </string-name>
          ,
          <article-title>A test collection for research on depression and language use</article-title>
          ,
          <source>in: International Conference of the Cross-Language Evaluation Forum for European Languages</source>
          , Springer,
          <year>2016</year>
          , pp.
          <fpage>28</fpage>
          -
          <lpage>39</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Salmerón-Ríos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>García-Díaz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Pan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Valencia-García</surname>
          </string-name>
          ,
          <article-title>Fine grain emotion analysis in Spanish using linguistic features and transformers</article-title>
          ,
          <source>PeerJ Computer Science</source>
          <volume>10</volume>
          (
          <year>2024</year>
          )
          <article-title>e1992</article-title>
          . doi:
          <volume>10</volume>
          .7717/peerj-cs.
          <year>1992</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , BERT:
          <article-title>Pre-training of Deep Bidirectional Transformers for Language Understanding</article-title>
          , CoRR abs/
          <year>1810</year>
          .04805 (
          <year>2018</year>
          ). URL: http://arxiv.org/abs/
          <year>1810</year>
          .04805. arXiv:
          <year>1810</year>
          .04805.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Joshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          , V. Stoyanov,
          <article-title>RoBERTa: A Robustly Optimized BERT Pretraining Approach</article-title>
          , CoRR abs/
          <year>1907</year>
          .11692 (
          <year>2019</year>
          ). URL: http://arxiv.org/abs/
          <year>1907</year>
          .11692. arXiv:
          <year>1907</year>
          .11692.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>F.</given-names>
            <surname>Barbieri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Camacho-Collados</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Neves</surname>
          </string-name>
          , L. Espinosa-Anke,
          <article-title>TweetEval: Unified benchmark and comparative evaluation for tweet classification</article-title>
          , arXiv preprint arXiv:
          <year>2010</year>
          .
          <volume>12421</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>J. A.</given-names>
            <surname>García-Díaz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Jiménez-Zafra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>García-Cumbreras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Valencia-García</surname>
          </string-name>
          ,
          <article-title>Evaluating feature combination strategies for hate-speech detection in spanish using linguistic features and transformers</article-title>
          , Complex &amp; Intelligent
          <string-name>
            <surname>Systems</surname>
          </string-name>
          (
          <year>2022</year>
          )
          <fpage>1</fpage>
          -
          <lpage>22</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>J. A.</given-names>
            <surname>García-Díaz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Valencia-García</surname>
          </string-name>
          ,
          <article-title>Compilation and evaluation of the spanish saticorpus 2021 for satire identification using linguistic features and transformers</article-title>
          ,
          <source>Complex &amp; Intelligent Systems</source>
          <volume>8</volume>
          (
          <year>2022</year>
          )
          <fpage>1723</fpage>
          -
          <lpage>1736</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>J. A.</given-names>
            <surname>García-Díaz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Colomo-Palacios</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Valencia-García</surname>
          </string-name>
          ,
          <article-title>Psychographic traits identification based on political ideology: An author analysis study on spanish politicians' tweets posted in 2020, Future Generation Computer Systems 130 (</article-title>
          <year>2022</year>
          )
          <fpage>59</fpage>
          -
          <lpage>74</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>