<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>HULAT-UC3M at Task1@eRisk 2025: Detecting Depression Using Machine Learning Approaches</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Javier Campos-Molina</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Paloma Martínez</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Computer Science and Engineering Department, Universidad Carlos III de Madrid</institution>
          ,
          <addr-line>Leganés, Madrid</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper describes the participation of HULAT-UC3M research group at Task 1: Search for Symptoms of Depression at eRisk 2025 shared task [1]. A proposal composed of three steps is proposes. The first is to train a SVM multi classifier using the embeddings from all-MiniLM-L6-v2 pretrained model to classify all the sentences into their corresponding symptom. Second step consists on a filter to select the most representative 1000 sentences to be sent and finally we will get the score for the sentences chosen in the previous step using a rule-based model and a encoder-based transformer (RoBERTa) for sentiment analysis. Performance of the best model is NDCG of 0.053 and P@10 of 0.157.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Natural Language Processing</kwd>
        <kwd>Machine learning</kwd>
        <kwd>Depression detection</kwd>
        <kwd>Classification</kwd>
        <kwd>LLM</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Starting from 2019 we have one of the participants using a text classifier called SS3 [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] for solving
task 3 [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. The task is related, but it is not exactly the same like the one solved in this paper as it
consists of classification of depression severity instead of classifying in symptoms and scoring them.
The classifier previously mentioned, SS3, is a probabilistic model using statistic in order to associate
some words. For each word creates a probability of being associated with other words taking into
account if it appears previously together with that word or not. Its important to mention that this is not
a transformer although it may appear similar in the sense that it assigns a probability between 0 and 1
to each word in relation to others. SS3 does not use self-attention as transformers does, but it relies on
probabilistic functions such as confidence (cf), support (sf), and credibility (cv) to model context.
      </p>
      <p>
        In 2020 edition [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] some systems proposed solutions based on roBERTa model, a model that is more
powerful than BERT as they were trained with times 10 more data. The model has a tokenizer itself
that was used to create the tokens, then create the embeddings and finally they did the classification
and a softmax as last layer to compute the probabilities. The team using this approach was the best in
terms of accuracy with more than a 69% [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>
        In 2021 edition [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] some of the proposed systems followed similar approaches to those of 2020. One
of the studies used BERT and roBERTa together [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] and the comparison of results were in favor of
roBERTa, as it was expected for the reason previously mentioned. Other systems proposed diferent
probabilistic methods similar as SS3 back into 2019. In this case, one of the groups participating on the
task proposed a system using Latent Dirichlet Allocation (LDA) [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], that consists of a Bayesian network,
in combination to sentence transformers and classical classifiers. LDA is very popular in unsupervised
learning tasks. In other hand and although it is not a task related to depression but to self-harm we have
some projects using interesting systems [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Self-harm task was a binary classification task between
users that needs to classify them by the ones at some point have harmed themselves and the ones who
have not. One of the team [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] used Yake for one of their runs, that is a model that takes the most
important words out of a sentence but it did not work as expected as it removed the important signs
of self-harm from the sentence. An additional run using VADER [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] model but there are no results
available for this model as the team did not submit the run.
      </p>
      <p>
        In 2022 edition [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], the system described in [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] used roBERTa but in addition they used a model
called MiniLM that is a model derived from roBERTa and BERT architectures, with whole self-attention
but it is a multilingual model and is able to work in diferent languages. RoBERTa model is only
specialized in English texts. Also MiniLM is smaller and faster than roBERTa and could help in terms of
eficiency. Another team introduced a fully connected neural network (FCNN) in the third run combined
with previously used systems as support vector machine (SVM) and transformers. The results were very
good in terms of recall (0.816 compared to 0.745 of the first team) but not in terms of precision (0.283
compared to 0.682). The team that won the competition [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] used the bag of words (BOW) approach,
combined with entropy-based weighting and a SVM classifier. BOW is a technique for converting text
into numerical representations, enabling the use of classical machine learning models as SVM, used in
this system. This team applied TF-IDF weighting enhanced with entropy, giving higher importance
to the most relevant words while reducing the influence of frequently occurring but not interesting
terms. Additionally, they employed chi-square feature selection to further improve classification speed
performance by retaining only the most relevant terms from the previous step by reducing the total
amount of data.
      </p>
      <p>
        eRisk 2023 edition [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] changed from binary to multiclassification classification. The objective of
the task is to classify symptoms of depression according to BDI-II Questionnaire [17] and give them
an score from 0 to 10. With the rise of generative AI, some of the teams [18] starting using LLM in
order to generate more data as is it done in other approaches outside this task [19][20]. One of the
teams [18] used ChatGPT in order to generate more data using a prompt for each of the symptoms.
Then it was combined with some new models that performed well capturing semantic relations as
MentalRoBERTa but the results were not good in comparison to the first team in the competition. The
max average precision out of the 5 runs were 0.104 far below 0.319. Another team [21] also attempted
to compare sentences based on their similarity by computing sentence embeddings using
transformerbased models. However, due to the high computational cost of encoding all sentences, they first used
the BM25 model [21] as a lightweight filtering step. Only the top ranked sentences, the most similar
to the ones from the BDI-II questionnaire, were retained and then processed with the transformer
models for similarity evaluation. The results were not good as the maximum AP of their runs is 0.039.
Furthermore, the winners of 2023 year’s task [22] used word2vec embeddings to capture semantic
and grammatical similarities between words. Then, a soft cosine similarity is applied to compare each
sentence in the dataset with the individual sentences describing each of the 21 symptoms from the
BDI-II questionnaire. Specifically, if a symptom in the BDI-II is described by four diferent sentence
options, the similarity between a dataset sentence and each of these options is computed individually,
resulting in four similarity scores. These scores are then weighted using predefined weights for each
option. The weighted similarity for each option is obtained by multiplying the similarity score by its
corresponding weight. Finally, the total similarity between the sentence and the symptom is calculated
as the sum of all the weighted similarities.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Method and system description</title>
      <p>
        Before the development of the solution, an analysis of the labeled data given by the eRisk organization
[
        <xref ref-type="bibr" rid="ref1 ref4">4, 1</xref>
        ] was done to check that there is enough data to compute a model and checking that all the labels
were in the correct format. The main objective of the task is to get the score for each of the symptoms
for 1000 sentences given in a test dataset. The process is structured in three steps. The first one is to
train a multi classifier to classify all the sentences into their corresponding symptom. After that, the
sentences were filtered according to diferent criteria for the diferent runs and finally, we will get the
score for the sentences chosen in the previous step using diferent methods.
      </p>
      <sec id="sec-3-1">
        <title>3.1. Multi classification of the sentences</title>
        <p>
          For the multi classification problem, the model is trained using the annotated data from 2024 [
          <xref ref-type="bibr" rid="ref1 ref4">4, 1</xref>
          ],
which are the ones ofered for training. Unanimity, which means that all annotators agree to label that
sentence as relevant to a particular symptom, is provided by organizers. Additionally, majority is also
provided. The computed model uses the unanimity sentences because using the majority dataset could
introduce sentences that increase the noise as not all of the annotators were agree. This can lead to
misclassifications in the model for some of the sentences, even though they have more training data.
        </p>
        <p>The proposed system uses a classical machine learning model, a support vector machine (SVM) 1 to
do the multi-classification on the 21 symptoms present in the BDI-II questionnaire [ 17]. Taking only the
relevant sentences, the model was trained using the embeddings created from a pretrained model called
"all-MiniLM-L6-v2" 2. An analysis of the results was made by dividing them into train and validation
datasets to test the model. The results of this test will be explained in section 4. Figure 1 represent the
steps followed to compute the model and how we used it.</p>
        <p>We trained the SVM with the hyperparameter probabilities set to true from the implementation given
at sklearn 3 in order to predict the test sentences and filter the only ones that have higher probabilities
than the threshold applied. Three diferent thresholds have been tested and the values are 0.75, 0.80,
0.85,0.90 and 0.95. The amount of sentences after each filter are represented in the table 1.
1https://scikit-learn.org/stable/modules/svm.html
2https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2
3https://scikit-learn.org/stable/modules/svm.html</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Selection of the sentences</title>
        <p>Once we have filtered the messages from the test dataset, a sample of at most 1000 of these test messages
was selected, which are the ones we have to send as runs for the proposed task. For this purpose, three
diferent approaches have been implemented for the selection of the messages.</p>
        <p>The first one (Figure 2) was to select the top 1000 sentences by confidence percentage, in other words,
the ones that the multi-classification model gave the highest probability of belonging to that symptom.
The pros and cons of this selection are clear, the pros is that the best samples will be sent to competition
and therefore better results are expected, however, it is possible that it does not contain some of the
symptoms as they are not classified with high percentage of confidence, and therefore, if the evaluation
is an average between the results of all the symptoms, it can lead to score 0 on them if it does not
contain any evaluation in this regard.</p>
        <p>The second way to take this sample was to use the sentences previously rated above 0.95 confidence.
We chose 0.95 because the other ones have a lot number of sentences for some of the symptoms and
may introduce a lot of randomness to the selection of the sentences. The table 1 would be the amounts
that would remain for each symptom after filtering by this confidence number.</p>
        <p>Now to select the 1000, they will be taken proportionally to each group so that for each symptom
its contribution is calculated as a percentage of 1000 and sentences are randomly selected from the
subset already extracted. In case of decimals, we round down to the nearest whole number. We do this
to ensure that the number of sentences does not exceed 1000. This approach ensures that you have
sentences of all symptoms. Figure 3 shows the process. The following formula would be a formalization
of the above applied for each symptom where Ni is the amount of sentences for the symptom and Nj is
the sum of the amount of all the sentences from the 21 symptoms:
⎢
⎢
⎢⎢ 
 = ⎢⎢ 21
⎣ ∑︀ 
=1
⎥
⎥
⎥
⎥
× 1000⎥⎥
⎦</p>
        <p>In the last approach, after manually reviewing the sentences some of them did not talk about the
symptom as if they were feeling it themselves but as if it was felt by someone else. A sub-selection of
reflexive sentences was implemented within the sub dataset of 0.90 confidence (figure 4). We took a
lower confidence because for some of the symptoms we did not reach the minimum we were looking
for in this test. Those containing reflexive pronouns or the first person english pronouns such as ‘I’
or ‘I’m’ and their variants were selected. We then took the same number of sentences from all the
symptoms to test with an equal distribution. Rounding up, this would leave 47 sentences per symptom.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Scoring the sentences</title>
        <p>The next step is to score the selected sentences out of 10 and two approaches have been implemented,
one using a model called VADER 4 and the other one called roberta-base-sentiment 5. Both of them are
models that are used especially for binary ranking but they have also been used for scoring in some of
the cases.</p>
        <p>In the case of VADER, it returns 4 values for each parsed sentence. Three of them are values between
0 and 1 referring to how positive, negative or neutral the sentence is, giving the sum between them a
total of 1. On the other hand, the parameter ‘compound’ is a number between -1 and 1 that combines
the three previous values, that is, the more negative the sentence, the more negative the value of the
compound and vice versa.</p>
        <p>The objective was to test what values it gave for sentences that were cataloged with some of the
symptoms of depression treated in this task, and then with the use of a formula adjust that value over 10.
When analyzing the sentences we saw that the sentences were never completely negative, or completely
positive but especially it was more dificult for them to be cataloged as positive, so we added a multiplier
to this score to increase the diference between the symptoms of greater severity with those of lesser
severity, putting a multiplier of 1.1 to the negative, and 1.2 to the positive respectively, as long as the
opposite was 0. That is, if a sentence had only negative and neutral values, the multiplier would be
applied, but if a sentence had all 3 values, or at least positive and negative, only the ‘compound’ value
would be taken into account. However, it should be noted that if VADER returns a positive value, it
means that the phrase has no negative connotation, so it should have a low score out of 10 and the
opposite is applied if VADER returns a negative value. The following formula represents how the value
of ‘compound’, together with the multiplier explained above, was used to calculate the score rounded to
two decimal places. Its important to take into account that the returned value has a maximum of 10 and
a minimum of 0.</p>
        <p>︂( 1 − sentiment_score · regulator )︂
2
× 10
4https://www.nltk.org/api/nltk.sentiment.vader.html
5https://huggingface.co/cardifnlp/twitter-roberta-base-sentiment
where sentiment_score is the value of compound and the regulator is the 1.1 or 1.2 value explained
before.</p>
        <p>On the other hand, the roberta-base-sentiment model is a very similar approach to VADER. In this
case, the model returns a label called label and another called score. Label has 3 values, LABEL_0,
LABEL_1 and LABEL_2 where 0 represents sentences categorized as negative, 1 represents neutral
sentences and 2 represents positive sentences. On the other hand, the score is a value between 0 and 1
that actually refers to the confidence of assigning it to the label (positive,neutral,negative). What we
have done in this case is that if it is a negative label (0), we multiply the score by 10, if it is neutral
(1) we multiply it by 5 and if it is positive (2) we multiply 1-score by 10. In this way we ensure that
sentences with more severe symptoms are given a higher score.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results and discussion</title>
      <p>Some internal tests were done for the multi classification task, as it is the only part we can really test it
in this way due to the lack of labeled data for the scores. In that case, we divide the sentences into train
and test, 80% and 20% respectively. The results show metrics as precision, recall and F1-score, including
also the amount of sentences used in test with the column name of support. The results are shown in
Table 2 divided by symptoms.</p>
      <p>Table 3 displays the results given by the eRisk organizers for the participants for task 1, in order to
compare our performance with the best team in the task. The table represents the results for majority,
meaning that at least 2 of the 3 assessor marked it as correct.</p>
      <p>Our submitted runs were a mixed of the previously explained methods, mixing diferent approached
to take the 1000 sentences and some diferent ways of scoring them. There was a typo in one of the
runs for roBERTa as it has the same name for two runs, but one run was using roBERTa scorer with
the top 1000 sentences by confidence so should be called roBERTa top, while the other one was using
roBERTa scorer again but with the sample of 1000 sentences instead of the top ones. The run starting
ID</p>
      <sec id="sec-4-1">
        <title>NDCG</title>
      </sec>
      <sec id="sec-4-2">
        <title>INESC-ID</title>
      </sec>
      <sec id="sec-4-3">
        <title>INESC-ID</title>
      </sec>
      <sec id="sec-4-4">
        <title>INESC-ID</title>
      </sec>
      <sec id="sec-4-5">
        <title>INESC-ID</title>
      </sec>
      <sec id="sec-4-6">
        <title>INESC-ID</title>
      </sec>
      <sec id="sec-4-7">
        <title>HULAT_UC3M</title>
      </sec>
      <sec id="sec-4-8">
        <title>HULAT_UC3M</title>
      </sec>
      <sec id="sec-4-9">
        <title>HULAT_UC3M</title>
      </sec>
      <sec id="sec-4-10">
        <title>HULAT_UC3M</title>
        <p>HULAT_UC3M
maxcos 0.235
unanimity 0.354
max 0.350
mix23 0.312
aug-best 0.247
roBERTa 0.018
vader top 0.015
reflexives roBERTa 0.013
roBERTa 0.004
vader sample 0.004
with the name of VADER uses the model VADER to score the sentences, being the one called VADER
top scoring the top sentences by confidence and VADER sample scoring the sample chosen sentences.
Finally, relfexives roBERTa uses the reflexive sample and roBERTa scorer.</p>
        <p>Across all evaluation metrics—Average Precision (AP), R-Precision (R-PREC), and Normalized
Discounted Cumulative Gain (NDCG), our runs performed significantly worse compared to the INESC-ID
team, that was the team with the best scores out of all the teams participating in the task. Their
submissions achieved consistently strong results, with the best run reaching an AP score of 0.354.</p>
        <p>Our best performing run was the one using the RoBERTa model, where we selected the top 1000
sentences based on confidence scores from our multiclass classification model with an AP of 0.018.
This was closely followed by a similar run using the VADER model for scoring with the same top
1000 sentence selection approach. These two runs outperformed our other three submission by a wide
margin in fact, they were up to four times more precise than the runs that used sentence sampling (runs
4 and 5) with a confidence threshold of 0.95, as previously described in this document.</p>
        <p>From these results, we can conclude that the multi class classification model was efective. The two
runs that used high confidence sentence selection clearly outperformed the runs based on lower
confidence sampling, suggesting that confidence based filtering had a strong positive impact on performance.
However, the methods used to score the sentences are not appropriate.</p>
        <p>Table 4 shows the result in case of unanimity, meaning that all the three annotators have to agree on
the sentence being well classified and scored.</p>
      </sec>
      <sec id="sec-4-11">
        <title>NDCG</title>
      </sec>
      <sec id="sec-4-12">
        <title>INESC-ID</title>
      </sec>
      <sec id="sec-4-13">
        <title>INESC-ID</title>
      </sec>
      <sec id="sec-4-14">
        <title>INESC-ID</title>
      </sec>
      <sec id="sec-4-15">
        <title>INESC-ID</title>
      </sec>
      <sec id="sec-4-16">
        <title>INESC-ID</title>
      </sec>
      <sec id="sec-4-17">
        <title>HULAT_UC3M</title>
      </sec>
      <sec id="sec-4-18">
        <title>HULAT_UC3M</title>
      </sec>
      <sec id="sec-4-19">
        <title>HULAT_UC3M</title>
      </sec>
      <sec id="sec-4-20">
        <title>HULAT_UC3M</title>
        <p>HULAT_UC3M
unanimity 0.269
max 0.223
mix23 0.201
aug-best 0.167
maxcos 0.164
reflexives roBERTa 0.013
roBERTa 0.008
vader top 0.006
roBERTa 0.002
vader sample 0.001</p>
        <p>In this case, the best run is the one that chooses reflexive sentences. It does not get worse compared
to the results given in the table 3 representing the result of majority, 0.013 of precision in both cases.
This result can lead us to think that if a sentence is reflexive, is more likely to be correctly selected than
if it is not, as all the sentences correctly scored were all ranked by unanimity as in majority we have
the same score compared to unanimity. In the case of the other runs, it gets worse by half or more of
the precision for our team, while the score from the best team in this case is the run called unanimity,
which can suggest that they used the data labeled as unanimity from the corpus provided by eRisk.
The result achieved from INESC-ID can lead to think that our proposal of using only the unanimity
sentences to train the classifier was in the good way, removing part of the noise or not clear sentences
from the labeled data.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions and Future Work</title>
      <p>We have presented a general overview of our participation at eRisk task 1, search for symptoms of
depression using mainly machine learning approaches. As we mentioned in the previous section, we
can notice that the approaches were not good in general even though we could get some possible
conclusions or future testing.</p>
      <p>For possible future work, we would like to change the way of scoring the sentences, as we could
notice thanks to some conclusions that it was the weakest point of our systems. Probably VADER and
roBERTa models are not adequate for this task, or we have not fixed fine tune it enough in order to get
the best use of it. For substituting this approaches we would like to try generative AI for generating
labeled data with the score giving in the prompt some examples of the labeled data and the BDI-II Sen.
Some other teams, as mentioned in the related work section used it to generate data in general, but we
would like to use it only for creating data with the scores with the idea of using it directly in a machine
learning architecture as a SVM, using a two level architecture, firstly the SVM classifier followed by the
SVM regressor for each of the symptoms for creating the scores.</p>
      <p>Other solution could be to explore generative AI directly to score the sentences previously chose
without training a machine learning architecture nor generating new data using generative AI. In this
case, the scores will directly depend on the prompt used to generate the data so it will be important to
give several accurate examples to the generative AI in order to achieve good results. In this approach
we would like to have a professional in depression for creating some scored sentences for each of the
symptoms to introduce them in the prompt. This approach without good and accurate examples do not
work as it depends directly in the random choices made by the AI.</p>
      <p>In other hand, some deep learning approaches would be really helpful for generating the scores. One
of the approaches that would like test is training a neural network (NN) with very extreme sentences of
depressed users in the internet and use a softmax classifier at the last layers of that NN. The purpose
of the approach is to return the probability of the sentence to be part of that symptom as a list of 21
values, one for each of the symptoms. Then the score will be computed by multiplying the maximum
probability between all of the symptoms by 10 in order to get the score and we will assign the sentence
to the symptom having the higher probability. In the end, the sentences clearly being part of one
symptom will get very high score and, if the model is unclear about it, the probability will be very
low, and therefore the score will follow to as depends on the probabilities returned by the model. This
approach substitutes the previously mentioned multiclassifier and requires appropriate data, as the
labeled data given by the eRisk task do not have only severe sentences for each symptom, so we will
need to combine data generation with generative AI or similar approaches for doing data augmentation
in combination with this approach.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This work was partially supported by Grant PID2023-148577OB-C21 (Human-Centered AI: User-Driven
Adapted Language Models-HUMAN AI) by MICIU/AEI/ 10.13039/501100011033 and by FEDER/UE.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>The team has used generative AI, particularly ChatGPT, in spelling check for creating this document
and code regarding latex format. In addition some minor errors in the code has been fixed using it.
on the internet, in: International Conference of the Cross-Language Evaluation Forum for European
Languages, Springer, 2023, pp. 294–315.
[17] A. T. Beck, C. H. Ward, M. Mendelson, J. Mock, J. Erbaugh, An inventory for measuring depression,</p>
      <p>Archives of general psychiatry 4 (1961) 561–571.
[18] A.-M. Bucur, Utilizing chatgpt generated data to retrieve depression symptoms from social media,
arXiv preprint arXiv:2307.02313 (2023).
[19] S. Ubani, S. O. Polat, R. Nielsen, Zeroshotdataaug: Generating and augmenting training data with
chatgpt, arXiv preprint arXiv:2304.14334 (2023).
[20] H. Dai, Z. Liu, W. Liao, X. Huang, Y. Cao, Z. Wu, L. Zhao, S. Xu, F. Zeng, W. Liu, et al., Auggpt:</p>
      <p>Leveraging chatgpt for text data augmentation, IEEE Transactions on Big Data (2025).
[21] D. Maupomé, T. Soulas, F. Rancourt, G. Cantin-Savoie, G. Winterstein, S. Mosser, M.-J. Meurs,</p>
      <p>Lightweight methods for early risk detection., in: CLEF (Working Notes), 2023, pp. 718–726.
[22] N. Recharla, P. Bolimera, Y. Gupta, A. K. Madasamy, Exploring depression symptoms through
similarity methods in social media posts., in: CLEF (Working Notes), 2023, pp. 763–772.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Parapar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Perez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Crestani</surname>
          </string-name>
          , Overview of erisk 2025:
          <article-title>Early risk prediction on the internet, in: Experimental IR Meets Multilinguality</article-title>
          , Multimodality, and Interaction - 16th
          <source>International Conference of the CLEF Association, CLEF</source>
          <year>2025</year>
          , Madrid, Spain, September 9-
          <issue>12</issue>
          ,
          <year>2025</year>
          , Proceedings,
          <string-name>
            <surname>Part</surname>
            <given-names>II</given-names>
          </string-name>
          , volume To be
          <source>published of Lecture Notes in Computer Science</source>
          , Springer,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>W. health</surname>
          </string-name>
          <article-title>organization (WHO), Depressive disorder (depression</article-title>
          ),
          <year>2023</year>
          . URL: https://www.who. int/news-room/fact-sheets/detail/depression, last access: 16 de mayo de
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>W. health</surname>
          </string-name>
          <article-title>organization (WHO), Depressive disorder (depression</article-title>
          ),
          <year>2025</year>
          . URL: https://www.who. int/news-room/fact-sheets/detail/suicide, last access: 16 de mayo de
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Parapar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Perez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Crestani</surname>
          </string-name>
          , Overview of erisk 2025:
          <article-title>Early risk prediction on the internet (extended overview)</article-title>
          ,
          <source>in: Working Notes of the Conference and Labs of the Evaluation Forum (CLEF</source>
          <year>2025</year>
          ), Madrid, Spain,
          <fpage>9</fpage>
          -
          <issue>12</issue>
          <year>September</year>
          ,
          <year>2025</year>
          , volume To be published of CEUR Workshop Proceedings, CEUR-WS.org,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S. G.</given-names>
            <surname>Burdisso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Errecalde</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. M. y Gómez</surname>
          </string-name>
          ,
          <article-title>Using text classification to estimate the depression level of reddit users</article-title>
          ,
          <source>Journal of Computer Science and Technology</source>
          , Vol
          <volume>21</volume>
          ,
          <string-name>
            <surname>Iss</surname>
            <given-names>1</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pp</surname>
          </string-name>
          e1-
          <fpage>e1</fpage>
          (
          <year>2021</year>
          ) (
          <year>2021</year>
          ). URL: https://doi.org/10.24215/16666038.21.e1. doi:
          <volume>10</volume>
          .24215/16666038.21.
          <year>e1</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S. G.</given-names>
            <surname>Burdisso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Errecalde</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Montes-y Gómez</surname>
          </string-name>
          ,
          <article-title>A text classification framework for simple and efective early depression detection over social media streams</article-title>
          ,
          <source>Expert Systems with Applications</source>
          <volume>133</volume>
          (
          <year>2019</year>
          )
          <fpage>182</fpage>
          -
          <lpage>197</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>D. E.</given-names>
            <surname>Losada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Crestani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Parapar</surname>
          </string-name>
          ,
          <article-title>Overview of erisk 2019 early risk prediction on the internet</article-title>
          ,
          <source>in: Experimental IR Meets Multilinguality, Multimodality, and Interaction: 10th International Conference of the CLEF Association, CLEF</source>
          <year>2019</year>
          , Lugano, Switzerland, September 9-
          <issue>12</issue>
          ,
          <year>2019</year>
          , Proceedings 10, Springer,
          <year>2019</year>
          , pp.
          <fpage>340</fpage>
          -
          <lpage>357</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>R.</given-names>
            <surname>Martínez-Castaño</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Htait</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Azzopardi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Moshfeghi</surname>
          </string-name>
          ,
          <article-title>Early risk detection of self-harm and depression severity using bert-based transformers: ilab at clef erisk 2020 (</article-title>
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.</given-names>
            <surname>Parapar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Martín-Rodilla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. E.</given-names>
            <surname>Losada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Crestani</surname>
          </string-name>
          , Overview of erisk at clef 2021:
          <article-title>Early risk prediction on the internet (extended overview)</article-title>
          .,
          <string-name>
            <surname>CLEF</surname>
          </string-name>
          (Working Notes)
          <volume>1</volume>
          (
          <year>2021</year>
          )
          <fpage>864</fpage>
          -
          <lpage>887</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10] S.
          <string-name>
            <surname>-H. Wu</surname>
            ,
            <given-names>Z.-J.</given-names>
          </string-name>
          <string-name>
            <surname>Qiu</surname>
          </string-name>
          ,
          <article-title>A roberta-based model on measuring the severity of the signs of depression</article-title>
          .,
          <source>in: CLEF (Working Notes)</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>1071</fpage>
          -
          <lpage>1080</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>R.</given-names>
            <surname>Manna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Monti</surname>
          </string-name>
          , et al.,
          <source>Unior nlp at erisk</source>
          <year>2021</year>
          :
          <article-title>Assessing the severity of depression with part of speech and syntactic features</article-title>
          ,
          <source>in: CEUR WORKSHOP PROCEEDINGS</source>
          , volume
          <volume>2936</volume>
          ,
          <string-name>
            <surname>CEUR</surname>
          </string-name>
          ,
          <year>2021</year>
          , pp.
          <fpage>1022</fpage>
          -
          <lpage>1030</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>L.</given-names>
            <surname>Barros</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Trifan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. L.</given-names>
            <surname>Oliveira</surname>
          </string-name>
          ,
          <article-title>Vader meets bert: sentiment analysis for early detection of signs of self-harm through social mining</article-title>
          .,
          <source>in: CLEF (working notes)</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>897</fpage>
          -
          <lpage>907</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>J.</given-names>
            <surname>Parapar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Martín-Rodilla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. E.</given-names>
            <surname>Losada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Crestani</surname>
          </string-name>
          , Overview of erisk at clef 2022:
          <article-title>Early risk prediction on the internet (extended overview)</article-title>
          ,
          <source>in: CEUR Workshop Proceedings (CEUR-WS. org)</source>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>A.-M. Bucur</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Cosma</surname>
            ,
            <given-names>L. P.</given-names>
          </string-name>
          <string-name>
            <surname>Dinu</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Rosso</surname>
          </string-name>
          ,
          <article-title>An end-to-end set transformer for user-level classification of depression and gambling disorder</article-title>
          ,
          <source>arXiv preprint arXiv:2207.00753</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>H.</given-names>
            <surname>Srivastava</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Lijin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sruthi</surname>
          </string-name>
          , T. Basu,
          <article-title>Nlp-iiserb@ erisk2022: Exploring the potential of bag of words, document embeddings and transformer based framework for early prediction of eating disorder, depression and pathological gambling over social media</article-title>
          .,
          <source>in: CLEF (Working Notes)</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>972</fpage>
          -
          <lpage>986</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>J.</given-names>
            <surname>Parapar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Martín-Rodilla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. E.</given-names>
            <surname>Losada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Crestani</surname>
          </string-name>
          , Overview of erisk 2023:
          <article-title>Early risk prediction</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>