<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>X. Larrayoz);</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Leveraging Conversational Context and Semantic Relabeling for Early Depression Detection</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Xabier Larrayoz</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Arantza Casillas</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alicia Pérez</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>HiTZ Center - Ixa (</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0003</lpage>
      <abstract>
        <p>Early detection of depression in online interactions is critical for timely intervention and support. The aim of this work is to identify users at risk in full conversational threads by contrast to previous works approaching the task in isolated posts from a single user. Our approach comprises three key components: (1) a semi-supervised semantic relabeling of training data using transformer-based embeddings and percentile-based score thresholds to reduce label noise; (2) fine-tuning of a multilingual transformer model for binary depression risk classification; and (3) an inference pipeline that computes per-thread user and context risk scores and fuses them into a global, cumulative risk measure. Our method was evaluated under both decision-based and ranking-based paradigms. While a conservative decision threshold limited precision in the classification task, our system achieved top-tier performance in ranking-based metrics (Precision and NDCG), demonstrating the eficacy of contextual signal fusion for early depression detection. We discuss the impact of fusion weights on early detection error (ERDE), latency, and overall F1, and outline future directions involving adaptive thresholding and advanced context encoding.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;early detection of depresion</kwd>
        <kwd>social media</kwd>
        <kwd>generative large language models</kwd>
        <kwd>natural language understanding</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Early detection of mental health disorders has been a central theme in all editions of eRisk [
        <xref ref-type="bibr" rid="ref1 ref2 ref3">1, 2, 3</xref>
        ],
becoming a benchmark task in the field of longitudinal data analysis in digital environments. Over the
years, the early detection task has focused on various clinical conditions: in 2022 the task addressed
depression [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]; in 2023 pathological gambling [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]; and in 2024 signs of anorexia [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. In all cases, the
main objective has been to issue an early alert about potential risk cases based on users’ posts in online
forums.
      </p>
      <p>
        In the 2024 edition, for instance, the NLP-UNED team proposed a system for the early detection
of signs of anorexia that included several key components [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Their approach began with a
semantic representation of messages using sentence encoders, followed by a relabelling process based on
Approximate Nearest Neighbors (ANN) techniques, allowing them to transform a dataset originally
annotated at the user level into one labelled at the message level. They further refined the embeddings
using contrastive learning, aiming to maximize the distance between examples from diferent classes.
For classification, they also relied on ANN methods combined with rules and heuristics to expand the
number of messages considered per user for each prediction. Their system achieved the best results in
both the decision-based and the ranking-based evaluations.
      </p>
      <p>
        Noteworthy, Riewe-Perła and Filipowska [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] combined language models with recommender systems
to predict whether recommended content originated from individuals with mental health conditions.
They used document and user embeddings along with a hybrid recommendation engine (LightFM [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]),
built on sentence transformers (SBERT [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]), to eficiently and early classify signals of potential risk.
      </p>
      <p>
        Beyond eRisk, the MentalRiskES competition series has also addressed early detection tasks across
diferent disorders, including depression, anxiety, and gambling [
        <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
        ]. However, in all of these cases,
the focus has remained on analyzing isolated user messages, without incorporating the conversational
context in which these messages appear.
      </p>
      <p>
        In contrast, the 2025 edition of eRisk [
        <xref ref-type="bibr" rid="ref10">10, 11</xref>
        ] introduces a paradigm shift by providing complete
conversational threads, allowing for the study of interactive dynamics between participants. This new
perspective enables a more realistic and clinically relevant analysis framework, in which risk indicators
may arise not only from the target user’s messages but also from the surrounding conversational
context.
      </p>
      <p>The availability of context brings clear advantages, such as enabling co-reference resolution and
a better understanding of discourse flow, which are essential for detecting subtle signals of distress.
However, it also introduces challenges: longer inputs increase computational cost and latency, and
models must carefully focus on the target user’s contributions without being distracted by irrelevant
surrounding content. This setting requires new modeling strategies that balance depth of analysis with
early response eficiency, opening both opportunities and challenges for future work.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Task Definition and Data Description</title>
      <p>In this work we are dealing with Task 2 of eRisk 2025, entitled Contextualized Early Detection of Depression.
This task introduces a novel scenario in depression detection as it leverages full conversational context.
Unlike previous editions of eRisk, where only isolated messages authored by each user were released,
the 2025 task provides participants with complete Reddit discussion threads involving the target user.
Thus, during the test phase, systems receive not only the messages written by the user under analysis
but also every other contribution within the thread, including the interaction structure that links the
posts. This design stems from the observation that the clinical relevance of a message often becomes
apparent only when interpreted in light of the surrounding conversation—for instance, an apparently
neutral reply may reveal signs of hopelessness when responding to criticism or a plea for help. The task
thereby simulates real-world scenarios in which detecting depression requires analyzing exchanges
between multiple participants.</p>
      <p>During the training phase, participants worked with a static corpus of 3,084 users derived from
earlier editions of eRisk (2017, 2018, and 2022), which included only the messages written by the target
user, with no conversational context. This dataset contains an average of 640 messages per user, with
2,772 users labeled as control (label 0) and 312 as depressed (label 1). The distribution across editions is
as follows: 1,400 users from the 2022 edition, 864 from 2017, and 820 from 2018. While the aim is to
detect the risk for a user with the user in a conversation with other users, the training corpus lacks of
contextual information, the entire thread is unavailable and this represents a challenge to train and
adapt the models to a realistic situation. To mitigate this mismatch, we explored strategies to reinterpret
the original annotations through a message-level semantic relabeling, and to design a decision-making
mechanism capable of aggregating risk signals over time in a way compatible with the interactive
structure of the test data.</p>
      <p>In the test phase, the eRisk 2025 server operated in an interactive manner: for each target user, a new
discussion thread was released in real time. Each thread constituted a submission round and included
all posts published up to that point, including those written by other interlocutors. After processing
the thread—taking into account both the content and its conversational structure—participants had to
return a binary label (positive/negative) along with a confidence score before the next thread became
available. This incremental protocol emulates a continuous monitoring environment in which access to
the full conversational context is crucial for early and efective risk assessment.</p>
      <p>To evaluate system performance, two complementary paradigms are used. First, the decision-based
evaluation focuses on the final label assigned to each user and the moment at which a positive prediction
is made. In this setting, classical classification metrics—precision, recall, and F1—are calculated, along
with the ERDE (Early Risk Detection Error) metric [12], which penalizes both late detections of true
positive cases and premature false positives. Additionally, the average detection latency (defined as the
average number of threads processed before the first positive prediction) is computed and combined
with the F1-score to obtain a latency-weighted F1, thus, balancing accuracy and promptness. Second,
the ranking-based evaluation uses the confidence scores after a fixed number of rounds (e.g., after 1,
100, or 500 threads) to rank users by their estimated risk. Metrics such as Precision@K and NDCG@K
are applied to assess the ability of the system to prioritize true positive cases. Together, these evaluation
paradigms ofer a comprehensive view of system performance, both in terms of classification quality
and timeliness of detection.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Proposed Methodology</title>
      <p>Our approach is structured into three main components, respectively explored in subsequent sections:
a semantic relabeling process applied to the training data, the design and fine-tuning of a classification
model, and an inference strategy based on the fusion of risk signals derived from both the target user
and the surrounding conversational context.</p>
      <sec id="sec-4-1">
        <title>4.1. Training set semantic re-labeling</title>
        <p>Although the final objective is to assign a risk label at the user level, our model operates at the message
level, processing each post independently. A common approach is to propagate the user’s label to
all their messages, treating every post from a depressed user as a positive instance. However, this
assumption introduces a significant amount of noise, since not all messages from a depressed user
necessarily exhibit linguistic markers of depression. Training under such noisy supervision can lead
the model to learn spurious correlations or to dilute the signal of truly informative posts. To address
this challenge, we designed a message-level relabeling strategy aimed at enhancing label fidelity and
improving the classifier’s robustness.</p>
        <p>
          In order to reduce noise in the original labels and improve training quality, we implemented a
semi-supervised message-level relabeling process. This methodology, inspired by prior work from
the NLP-UNED group [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], as well as other related approaches proposed by diferent studies [ 13, 14].
Adapting this strategy to our specific task, we leveraged semantic representations obtained through
pretrained embeddings. For each message, a similarity score was computed with respect to representative
positive and negative examples, and a percentile-based strategy was applied to determine which
messages were suitable for relabeling. This process allowed for precise control over the proportion of
modified instances, preserving semantic consistency across the corpus while mitigating the efects of
incorrect labels.
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Classifier</title>
        <p>The classifier is based on a multilingual transformer architecture with 12 layers and an embedding
dimension of 768, derived from the XLM-RoBERTa model family [15]. The model was fine-tuned on
the relabeled dataset (mentioned in section 4.1) using a binary cross-entropy loss function. Several
combinations of hyperparameters (learning rate, batch size, and number of epochs) were explored
through grid search, and the best configuration was selected based on performance over a validation
set.</p>
        <p>The model operates at the message level: each post is processed independently, and the classifier
outputs a probability score indicating the likelihood that the message reflects signs of depression. These
individual message-level scores do not directly determine the user’s risk label; instead, they serve as
input for a subsequent aggregation phase. In the next subsection, we describe how these scores are
combined at the thread level to compute a global risk score per user, which is ultimately used for the
ifnal decision-making.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Decision making via Risk Signal Fusion</title>
        <p>
          During the test phase, the system receives, sequentially, complete conversation threads in which the
target user has participated. For each thread received at time , three risk scores are computed:
• User risk score (): computed as the mean depression probability of all messages written by
the target user within the thread.
• Context risk score (): computed as the mean depression probability of all other messages in
the thread not authored by the target user.
• Thread risk score (): We explored three alternatives:
– Weighted linear combination of user risk and context risk, denoted as   and computed
as in expression (1), where the parameter  ∈ [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ] controls the relative influence of the
user versus the context. This parameter,  , was empirically tuned during validation.
– Mean message risk : Thread risk score computed as the mean depression probability of
all messages in the thread regardless of author as in (2).
– Maximum: Thread risk score equal to the maximum between user risk and context risk
denoted as  in (2).
(1)
(2)
(3)
(4)
        </p>
        <p>Based on the thread-level scores observed up to time , a global risk score is calculated as the average
of all previous thread scores, as in (4).</p>
        <p>Finally, the binary decision regarding whether the user is at risk of depression is obtained by
comparing global() to a threshold  , which was also optimized on the validation set. This strategy
enables the system to incrementally integrate both the behavioral progression of the target user and the
() =
 () =  · () + (1 −  ) · ()
1</p>
        <p>∑︁ ()
|| ∈
() =
max ()
∈
global() = 1 ∑︁ ().</p>
        <p>=1
evolving conversational dynamics. Moreover, for the ranking-based evaluation, the confidence score
assigned to each user corresponds directly to the value of global(), thus preserving the relative risk
intensity inferred from the conversation.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results and Discussion</title>
      <p>In our experiments, we conducted five configurations (runs) distinguished by how the thread risk score
is computed, as presented in section 4.3. Table 1 summarizes the configurations by run.
ERDE50, true positive latency (latency  ), speed, and latency-weighted F1 (F).</p>
      <p>Our team achieved the best performance in ERDE5, ERDE50, latency  , and speed, demonstrating
the efectiveness of our early detection strategy. Of our five configurations, R3 (mean probability of
all thread messages) achieved the best balance, reaching ERDE5 = 0.05, ERDE50 = 0.03, the lowest
true-positive latency (1 thread), and optimal speed (1.00). Configuration R4 (max-risk fusion) performed
worst, confirming that emphasizing only the highest-risk message is suboptimal. Additionally, we
observe progressive improvements in ERDE and latency as the context weight increases up to R3,
validating the usefulness of conversational context.
submitted by our team. Evaluated at various writing thresholds (1, 100, 500, and 1000), our approach
demonstrates highly competitive performance in Precision and NDCG across all cutofs, consistently
ranking among the top participants.</p>
      <p>Our approach has been competitive across all ranking metrics, ranking among the top participants.
This supports our motivation to opt for risk signal fusion approach and we feel that further eforts</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusions</title>
      <p>In this article we deal with early detection of user risk level of depression in social media threads. Even
though previous approaches focused on posts written by the target user, in this task we count on the
posts written in context by the target user and also by the interacting users. The scenario emulates
a real continuous monitoring environment with full conversational context. The research question
rests on the means to leverage contextual information, provided by other posts, together with target
user posts in what it comes to seize the risk on the target user. The system has to process the thread in
chronological subsequences of messages and has to provide two outcomes: the high risk alarm (or no
alarm) decision together with the confidence on the decision made. The assessment comprises decision
accuracy and earliness together with confidence ranking. An added challenge in this tasks rests on the
fact that the training data consisted on mere single-user posts without context.</p>
      <p>We contribute with an early depression detection approach that integrates individual user risk signals
with conversational context. The results from the ranking-based evaluation demonstrate the validity
of our approach, as we were able to rank users by their risk level in a highly competitive manner,
consistently placing among the top participants. However, in the decision-based evaluation our precision
and F1 scores were impacted by the choice of a relatively low decision threshold. This conservative
setting, determined without contextual data during training, increased the number of false positives
and reduced performance on classification metrics.</p>
      <p>Future work will explore adaptive thresholding techniques that leverage contextual information to
optimally balance precision and recall, as well as more advanced context-encoding mechanisms to
further enhance early depression detection.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>This work was partially funded by the Spanish Ministry of Science and Innovation (EDHIA
PID2022136522OB-C22) and by the Basque Government (IXA IT-1570-22). Besides, this work was elaborated
within the framework of LOTU (TED2021-130398B-C22) funded by MCIN/AEI/10.13039/501100011033,
European Comission (FEDER), and by the European Union “NextGenerationEU”/PRTR. The first author
is a recipient of a grant from the Spanish Ministry of Education’s Formación de Profesorado Universitario
program (FPU23/01068).</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>The authors have not employed any Generative AI tools.
Forum (CLEF 2025), Madrid, Spain, 9-12 September, 2025, volume To be published of CEUR
Workshop Proceedings, CEUR-WS.org, 2025.
[11] J. Parapar, A. Perez, X. Wang, F. Crestani, Overview of erisk 2025: Early risk prediction on
the internet, in: Experimental IR Meets Multilinguality, Multimodality, and Interaction - 16th
International Conference of the CLEF Association, CLEF 2025, Madrid, Spain, September 9-12,
2025, Proceedings, Part II, volume To be published of Lecture Notes in Computer Science, Springer,
2025.
[12] D. E. Losada, F. Crestani, A test collection for research on depression and language use, in: N. Fuhr,
P. Quaresma, T. Gonçalves, B. Larsen, K. Balog, C. Macdonald, L. Cappellato, N. Ferro (Eds.),
Experimental IR Meets Multilinguality, Multimodality, and Interaction, Springer International
Publishing, Cham, 2016, pp. 28–39.
[13] X. Larrayoz, N. Lebena, A. Casillas, A. Pérez, Representation Exploration and Deep Learning
Applied to the Early Detection of Pathological Gambling Risks, in: CLEF (Working Notes), 2023,
pp. 693–705.
[14] H. Fabregat, A. Duque, L. Araujo, J. Martínez-Romo, Uned-nlp at erisk 2022: Analyzing gambling
disorders in social media using approximate nearest neighbors, in: Conference and Labs of the
Evaluation Forum, 2022. URL: https://api.semanticscholar.org/CorpusID:251471984.
[15] L. Wang, N. Yang, X. Huang, L. Yang, R. Majumder, F. Wei, Multilingual e5 text embeddings: A
technical report, arXiv preprint arXiv:2402.05672 (2024).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Parapar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Martín-Rodilla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. E.</given-names>
            <surname>Losada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Crestani</surname>
          </string-name>
          , Overview of risk 2022:
          <article-title>Early risk prediction on the internet</article-title>
          , in: A.
          <string-name>
            <surname>Barrón-Cedeño</surname>
          </string-name>
          , G. Da San Martino, M. Degli
          <string-name>
            <surname>Esposti</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Sebastiani</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Macdonald</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Pasi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Hanbury</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Potthast</surname>
          </string-name>
          , G. Faggioli, N. Ferro (Eds.),
          <source>Experimental IR Meets Multilinguality, Multimodality, and Interaction</source>
          , Springer International Publishing, Cham,
          <year>2022</year>
          , pp.
          <fpage>233</fpage>
          -
          <lpage>256</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Parapar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Martín-Rodilla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. E.</given-names>
            <surname>Losada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Crestani</surname>
          </string-name>
          , Overview of erisk 2023:
          <article-title>Early risk prediction on the internet</article-title>
          , in: A.
          <string-name>
            <surname>Arampatzis</surname>
            , E. Kanoulas,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Tsikrika</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Vrochidis</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Giachanou</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Aliannejadi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Vlachos</surname>
          </string-name>
          , G. Faggioli, N. Ferro (Eds.),
          <source>Experimental IR Meets Multilinguality, Multimodality, and Interaction</source>
          , Springer Nature Switzerland, Cham,
          <year>2023</year>
          , pp.
          <fpage>294</fpage>
          -
          <lpage>315</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Parapar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Martín-Rodilla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. E.</given-names>
            <surname>Losada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Crestani</surname>
          </string-name>
          , Overview of erisk 2024:
          <article-title>Early risk prediction on the internet</article-title>
          , in: L.
          <string-name>
            <surname>Goeuriot</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Mulhem</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Quénot</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Schwab</surname>
            ,
            <given-names>G. M.</given-names>
          </string-name>
          <string-name>
            <surname>Di Nunzio</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Soulier</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Galuščáková</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>García Seco de Herrera</surname>
          </string-name>
          , G. Faggioli, N. Ferro (Eds.),
          <source>Experimental IR Meets Multilinguality, Multimodality, and Interaction</source>
          , Springer Nature Switzerland, Cham,
          <year>2024</year>
          , pp.
          <fpage>73</fpage>
          -
          <lpage>92</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>H.</given-names>
            <surname>Fabregat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Deniz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Duque</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Araujo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Martínez-Romo</surname>
          </string-name>
          ,
          <article-title>NLP-UNED at eRisk 2024: Approximate Nearest Neighbors with Encoding Refinement for Early Detecting Signs of Anorexia</article-title>
          ,
          <source>in: Working Notes of CLEF 2024 - Conference and Labs of the Evaluation Forum</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>813</fpage>
          -
          <lpage>824</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>O.</given-names>
            <surname>Riewe-Perła</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Filipowska</surname>
          </string-name>
          ,
          <source>Combining Recommender Systems and Language Models in Early Detection of Signs of Anorexia, in: Working Notes of CLEF 2024 - Conference and Labs of the Evaluation Forum</source>
          , Grenoble, France,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Kula</surname>
          </string-name>
          ,
          <article-title>Metadata embeddings for user and item cold-start recommendations</article-title>
          , in: T. Bogers, M. Koolen (Eds.),
          <source>Proceedings of the 2nd Workshop on New Trends on Content-Based Recommender Systems co-located with 9th ACM Conference on Recommender Systems (RecSys</source>
          <year>2015</year>
          ), Vienna, Austria,
          <source>September 16-20</source>
          ,
          <year>2015</year>
          ., volume
          <volume>1448</volume>
          <source>of CEUR Workshop Proceedings</source>
          , CEURWS.org,
          <year>2015</year>
          , pp.
          <fpage>14</fpage>
          -
          <lpage>21</lpage>
          . URL: http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>1448</volume>
          /paper4.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>N.</given-names>
            <surname>Reimers</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Gurevych</surname>
          </string-name>
          ,
          <article-title>Sentence-bert: Sentence embeddings using siamese bert-networks</article-title>
          ,
          <source>in: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics</source>
          ,
          <year>2019</year>
          . URL: https://arxiv.org/abs/
          <year>1908</year>
          .10084.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Mármol Romero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. Moreno</given-names>
            <surname>Muñoz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Plaza del Arco</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. D. Molina González</surname>
            ,
            <given-names>M. T. Martín</given-names>
          </string-name>
          <string-name>
            <surname>Valdivia</surname>
            ,
            <given-names>L. A.</given-names>
          </string-name>
          <string-name>
            <surname>Ureña López</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Montejo</surname>
            <given-names>Ráez</given-names>
          </string-name>
          , Overview of mentalriskes at iberlef 2023:
          <article-title>Early detection of mental disorders risk in spanish</article-title>
          ,
          <source>Procesamiento del lenguaje natural 71</source>
          (
          <year>2023</year>
          )
          <fpage>329</fpage>
          -
          <lpage>350</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Mármol Romero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. Moreno</given-names>
            <surname>Muñoz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Plaza del Arco</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. D. Molina González</surname>
            ,
            <given-names>M. T. Martín</given-names>
          </string-name>
          <string-name>
            <surname>Valdivia</surname>
            ,
            <given-names>L. A.</given-names>
          </string-name>
          <string-name>
            <surname>Ureña López</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Montejo</surname>
            <given-names>Ráez</given-names>
          </string-name>
          , Overview of mentalriskes at iberlef 2024:
          <article-title>Early detection of mental disorders risk in spanish</article-title>
          ,
          <source>Procesamiento del lenguaje natural 73</source>
          (
          <year>2024</year>
          )
          <fpage>435</fpage>
          -
          <lpage>448</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J.</given-names>
            <surname>Parapar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Perez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Crestani</surname>
          </string-name>
          , Overview of erisk 2025:
          <article-title>Early risk prediction on the internet (extended overview)</article-title>
          ,
          <source>in: Working Notes of the Conference and Labs of the Evaluation</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>