<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>N. Munz);</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Semantic Retrieval of BDI Symptoms in User Writings</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Noam Munz</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Eliya Naomi Aharon</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Avi Segal</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kobi Gal</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Ben-Gurion University of the Negev</institution>
          ,
          <addr-line>Beer-Sheva</addr-line>
          ,
          <country country="IL">Israel</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Edinburgh</institution>
          ,
          <addr-line>Edinburgh</addr-line>
          ,
          <country country="UK">United Kingdom</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>000</volume>
      <fpage>9</fpage>
      <lpage>0009</lpage>
      <abstract>
        <p>We present our approach to Task 1 of the CLEF eRisk 2025 Lab, which focuses on identifying depression symptoms in user-generated text. The task is formulated as a sentence ranking problem, aiming to retrieve sentences relevant to each of the 21 symptoms defined in the Beck Depression Inventory-II (BDI-II). The method employs SentenceBERT to compute semantic similarity between user text and symptom queries derived from the BDI questionnaire's multiple-choice responses. To improve coverage, queries are expanded based on retrieval results from the training set. Additionally, sentences not referring to the user are filtered out to reduce noise from third-person narratives. Our approach achieved competitive performance, with Average Precision substantially exceeding the median of all submitted systems. This demonstrates the promise of semantic retrieval and first-person filtering for identifying fine-grained depressive symptoms at scale.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Sentence-BERT</kwd>
        <kwd>Semantic Similarity</kwd>
        <kwd>Text Retrieval</kwd>
        <kwd>Mental Health NLP</kwd>
        <kwd>Beck's Depression Inventory-II</kwd>
        <kwd>Large Language Models</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The CLEF eRisk 2025 Lab Task 1 [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ] focuses on identifying signs of depression in user-generated
text. The task involves ranking sentences based on their relevance to 21 symptoms defined by the
Beck Depression Inventory-II (BDI-II) [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], a widely used clinical tool for assessing depression severity.
Participants are provided with sentence-level user writings and are tasked with returning, for each
symptom, a ranked list of 1000 sentences that best reflect the user’s mental state regarding that symptom.
Relevant sentences may indicate either the presence or absence of the symptom.
      </p>
      <p>
        Detecting fine-grained indicators of depression from text can support early intervention and improve
access to mental health care [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], particularly in digital contexts where individuals often express their
emotional states [
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ].
      </p>
      <p>
        The task presents several challenges. First, the dataset contains 17,553,441 texts, making retrieval
computationally demanding. Second, many sentences reference people other than the author, introducing
ambiguity around whose mental state is being described. This adds noise and requires disambiguation
between self-disclosure and commentary about others [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ].
      </p>
      <p>
        Our approach utilizes Sentence-BERT [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] embeddings to retrieve relevant sentences. We investigate
the impact of query expansion on retrieval efectiveness and apply filtering to focus on first-person
references, aiming to reduce noise from irrelevant or third-person content. Submissions conform to
the TREC format and are evaluated using standard retrieval metrics including Average Precision (AP),
R-Precision (R-PREC), Precision at 10 (P@10), and nDCG, with human relevance judgments created via
pooling.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <sec id="sec-2-1">
        <title>2.1. Semantic Retrieval</title>
        <p>This section reviews relevant work across computational methods for semantic retrieval and
psychological foundations related to depression and its assessment.</p>
        <p>
          Semantic representation methods have evolved with the popularity of transformer-based models [
          <xref ref-type="bibr" rid="ref10 ref11">10, 11</xref>
          ],
which capture contextualized word and sentence embeddings [
          <xref ref-type="bibr" rid="ref12 ref13">12, 13</xref>
          ]. These models have demonstrated
superior performance in encoding semantic information compared to traditional word embeddings [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ].
Their ability to capture contextual dependencies enables more efective similarity measurements and
downstream tasks like retrieval and classification [
          <xref ref-type="bibr" rid="ref15 ref16">15, 16</xref>
          ].
        </p>
        <p>Query expansion and ranking are established techniques in retrieval tasks. Query expansion broadens
the original query with related terms to capture a wider range of relevant information [17]. Ranking
methods order results based on relevance scores, often leveraging scoring metrics or learned models
[18, 19].</p>
        <p>Large language models (LLMs) have shown strong capabilities in zero-shot classification, where tasks
are performed without task-specific training [ 20]. By leveraging pre-trained knowledge, LLMs can
generalize to new tasks [21]. This makes them especially useful in domains with limited labeled data
[22].</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Depression Symptoms and the Beck Depression Inventory</title>
        <p>
          Depression is a complex mental health disorder characterized by a range of emotional, cognitive, and
physical symptoms. These symptoms can include sadness, loss of interest or pleasure, disturbed sleep
or changes in appetite [23, 24]. Accurate identification of these symptoms is critical for diagnosis,
treatment, and research [25]. Standardized tools like the Beck Depression Inventory (BDI) provide a
structured way to assess the presence and severity of depressive symptoms based on self-reported data
[
          <xref ref-type="bibr" rid="ref3">3</xref>
          ].
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <p>Our approach consists of representing each BDI-II symptom as a set of natural language queries,
computing semantic similarity scores between these queries and sentences in the dataset, and ranking
sentences accordingly. To improve retrieval, we apply query expansion based on training data and
post-process results to remove non-first-person statements. This section describes the steps in detail.</p>
      <sec id="sec-3-1">
        <title>3.1. Problem Formulation</title>
        <p>The dataset, denoted as  = {1, 2, . . . ,  }, consists of sentences. Each sample includes the target
sentence  along with its preceding and following sentences. In this work we use only the target
sentence itself.</p>
        <p>The 21 BDI-II symptoms are represented as a set  = {1, 2, . . . , 21}. Each symptom  ∈  is
detailed by  graded statements {1, 2, . . . , }, describing increasing severity levels of the symptom.</p>
        <p>For each symptom , the goal is to produce a ranked list  of the top 1000 sentences from  that
are most relevant to .</p>
        <p>The retrieval efectiveness of  is evaluated against human relevance judgments to measure how
well the ranking aligns with actual symptom relevance.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Sentence Embedding and Similarity Scoring</title>
        <p>We represent both user sentences  ∈  and symptom graded statements {1, 2, . . . , } using
Sentence-BERT, a transformer-based model that encodes sentences into fixed-size dense vectors in a
shared semantic space.</p>
        <p>For each sentence  and symptom , we compute the cosine similarity between the embedding
of  and each of the embeddings corresponding to the graded statements 1 through . The final
similarity score between sentence  and symptom  is defined as the maximum of these values:
score(, ) =</p>
        <p>max
∈{1,2,...,}</p>
        <p>cos(emb(), emb( ))</p>
        <p>This results in a relevance score for each sentence–symptom pair, which we use to produce a ranked
list  by sorting all sentences  ∈  in descending order of their scores. To illustrate, we present an
example for the symptom sadness. The graded statements for this symptom are:
• Statement 1: “I do not feel sad.”
• Statement 2: “I feel sad much of the time.”
• Statement 3: “I am sad all the time.”
• Statement 4: “I am so sad or unhappy that I can’t stand it.”</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Query Expansion</title>
        <p>To improve recall and capture a broader range of symptom expressions, we apply query expansion
using phrases derived from previous years’ datasets. For each symptom , we compute similarity
scores between all sentences from the 2023 and 2024 datasets and the original  BDI-II symptom graded
statements {1, 2, . . . , } as described previously. This results in a similarity score for each sentence
with respect to each symptom.</p>
        <p>We then iterate over each symptom and select the top  sentences with the highest similarity scores
as additional query representations for that symptom (the choice of  is discussed in Section 4). These
selected sentences are treated as pseudo-relevance feedback and appended to the original query set:
{1, 2, . . . , } →− { 1, . . . , +}</p>
        <p>This approach aims to build an exhaustive query set for each symptom by covering diverse phrasings
and ways users may express the symptom. Examples of original and expanded queries for two symptoms
are shown in Table 2.</p>
        <p>Sadness
Loss of Energy</p>
        <p>I do not feel sad.</p>
        <p>I feel sad much of the time.</p>
        <p>I am sad all the time.</p>
        <p>I am so sad or unhappy that I can’t stand it.</p>
        <p>I have as much energy as ever. I have so much energy now its immense.
I have less energy than I used to have. But I don’t have the energy for anything
at the moment.</p>
        <p>I don’t have enough energy to do very much. I often don’t have the energy.</p>
        <p>I don’t have enough energy to do anything. I’ve never had much energy.</p>
        <p>Expanded Statements
Every time I get really sad.</p>
        <p>Why do I feel sad?
Sometimes I’m just sad.</p>
        <p>I just feel sad and cold inside all the time.</p>
        <p>Similarity between a test sentence  and a symptom  is then computed as the maximum cosine
similarity across this expanded set of  +  phrases.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. First Person Filtering</title>
        <p>
          First-person statements ofer the most direct insight into a user’s mental health, as they capture
self-reported experiences related to depressive symptoms [
          <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
          ]. Since the competition task required
including only sentences that provide information about the writer, we applied first-person filtering to
improve the quality of the final ranking. This helps reduce noise from sentences referring to others or
general situations. We employed three approaches to identify first-person language.
1. Pronoun Filter: a simple keyword-based filter checked for the presence of first-person pronouns
including: I, me, my, we, ourselves, mine, our, ours, I’m, I’ve.
2. SpaCy Filter: spaCy1, an open-source natural language processing library, was used to identify
whether the grammatical subject of the sentence was in the first person based on syntactic
dependencies and morphological features.
3. LLM-Based Filter: we employed a large language model (LLM) in a zero-shot classification
setting to identify first-person narratives without task-specific training [ 26]. Specifically, we used
Claude Sonnet 3.7 [27], a top-ranked model on the Hugging Face Chatbot Arena Leaderboard2, to
analyze whether texts reflected the writer’s personal experience by focusing on self-references
and symptom connection. Details of prompt evaluation and refinement, including the prompt
text, are provided in Section 4.2.
        </p>
        <p>After filtering, we produced new ranked lists   containing only sentences identified as first-person
narratives.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments</title>
      <sec id="sec-4-1">
        <title>4.1. Datasets</title>
        <p>Our experiments utilize datasets from three diferent years: 2023, 2024, and 2025. The 2023 and 2024
datasets include labeled sentences, where each sentence is annotated with a binary indication of
relevance to a symptom. For both years, we report the number of sentences for the full datasets as
well as for annotated subsets based on majority vote and full annotator consensus (Table 3). The 2025
dataset used for the current task is unlabeled and contains only raw user sentences.</p>
        <p>All datasets follow the TREC format, where each sample includes a document ID, the target sentence,
as well as the preceding and following sentences (though only the target sentence is used in this work).
1https://spacy.io/
2https://huggingface.co/spaces/lmarena-ai/chatbot-arena-leaderboard</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Hyperparameter Tuning</title>
        <p>We tuned two key hyperparameters to optimize retrieval performance.</p>
        <p>Query Expansion Size : We tested values of  ranging from 10 to 100 in increments of 10. For each
, we evaluated retrieval quality using the merged 2023-2024 consensus labeled dataset (Section 4.4).
Performance improved up to  = 30 and then slightly declined, so we selected  = 30 for the final
expansions.</p>
        <p>LLM Prompt Refinement: We applied the similarity scoring and query expansion to the 2023 and
2024 datasets to obtain the top 100 sentences per symptom, creating a pool of 2,100 sentences. After
removing duplicates, we randomly sampled 200 sentences for evaluation. Using the prompted LLM,
each sentence was labeled for first-person language. Two annotators assessed labeling accuracy. The
prompt wording was iteratively refined to improve the LLM’s accuracy until improvements plateaued.</p>
        <p>Prompt Used for Annotation (Final Version)
Analyze the following text to determine if it provides information about the writer’s personal experience
with the specified symptom.
&lt;symptom&gt;{symptom}&lt;/symptom&gt;
&lt;text&gt;{text}&lt;/text&gt;
Consider the text informative (YES) if it reveals anything about the writer’s personal relationship with
the symptom – whether they have it, had it, are recovering from it, don’t have it, etc.</p>
        <p>Consider the text non-informative (NO) if the symptom is only mentioned in relation to other people or
discussed generally without personal connection to the writer.</p>
        <p>Pay special attention to first-person language and direct self-references that connect the writer to the
symptom.</p>
        <p>Return only “YES” or “NO” based on your analysis.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Configurations</title>
        <p>We evaluated five retrieval configurations to assess the impact of query expansion and first-person
ifltering strategies:
• sbert: baseline model using Sentence-BERT with original BDI-II symptom queries.
• sbert-w-expansion: adds top-30 high-scoring training sentences to each symptom’s query set
for expanded semantic coverage.
• sbert-w-expansion-w-naive-fp: applies the Pronoun Filter, which detects first-person
language using keyword matching.
• sbert-w-expansion-w-spacy-fp: applies the spaCy Filter, which identifies first-person
grammatical subjects via syntactic parsing.
• sbert-w-expansion-w-naive-fp-w-claude: applies the LLM-Based Filter, which verifies
firstperson relevance through LLM-based classification.</p>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. Evaluation</title>
        <p>To evaluate our retrieval configurations, we used a labeled test set created by combining the 2023 and
2024 consensus datasets. These datasets contain high-confidence binary annotations indicating sentence
relevance to each BDI-II symptom. The merged evaluation set includes 36,403 annotated sentences.</p>
        <p>Following common practice [28], we generated a ranked list of 100 sentences per symptom for each
configuration and evaluated it against the labeled set. The evaluation metrics were:
• Precision@k for  ∈ {10, 30, 50}: the proportion of relevant sentences in the top- positions
of each ranking.
• Average Precision (AP): the average of precision scores at each rank where a relevant sentence
appears.
• R-Precision (R-PREC): precision at , where  is the total number of relevant sentences for a
given symptom.
• nDCG (normalized Discounted Cumulative Gain): a rank-aware metric that rewards placing
relevant items higher in the list.</p>
        <p>All metrics were computed per symptom and then averaged over the 21 BDI-II symptoms.</p>
      </sec>
      <sec id="sec-4-5">
        <title>4.5. Implementation</title>
        <p>All experiments were conducted on a machine with an NVIDIA RTX 6000 GPU using the
sentence-transformers/nli-roberta-base-v2 model via the SentenceTransformers library.
The spaCy filter used spaCy’s en_core_web_sm pipeline.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results</title>
      <p>We now describe the final competition results. For each of the 21 symptoms defined by the BDI-II
questionnaire, participating teams were required to submit a ranked list of up to 1,000 relevant sentences.
Each team was permitted to submit up to five system configurations for evaluation. In total, 17 teams
took part in the eRisk 2025 Task 1 competition, resulting in 67 submitted runs.</p>
      <p>The evaluation process involved three expert assessors who independently judged the relevance
of sentences for each symptom. Relevance was determined using two complementary criteria: under
majority voting, a sentence was deemed relevant if at least two assessors agreed; under unanimity
voting, relevance required consensus among all three assessors. System performance was assessed
using standard ranking metrics, including Average Precision, R-Precision, Precision@10, and NDCG.</p>
      <p>We also report the top-performing runs submitted by other teams for each evaluation setting.
Specifically, we include two configurations from Team INESC-ID: one that achieved the highest scores in
Average Precision, R-Precision, and Precision@10, and another that achieved the best NDCG. In addition,
we report the mean and median scores across all submitted runs, following common practice in prior
work [29].</p>
      <p>In both majority (Table 4) and unanimity (Table 5) voting evaluations , our sbert configuration
achieved the highest Average Precision, R-Precision, and NDCG among our tested methods. This
suggests heuristic query expansions and filters may add noise that lowers overall ranking quality.
However, combining query expansion with first-person filtering improved Precision@10, indicating
that first-person filtering may help prioritize personal disclosures at the top. Among these filters, the
LLM-based approach performed best, likely due to its enhanced semantic understanding of context.</p>
      <p>In the unanimity voting evaluation (Table 5), all our configurations scored lower across metrics
compared to our majority voting results, reflecting the stricter relevance criterion. The relative benefit
of first-person filtering on Precision@10 was slightly higher under this stricter setting, though the
sbert configuration still remained strongest on overall ranking metrics.</p>
      <p>Compared to other teams, our approach consistently performed above the overall mean and median
across all reported metrics, demonstrating competitiveness in this domain.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusions and Future Work</title>
      <p>We presented a retrieval approach using Sentence-BERT embeddings combined with query expansion
and first-person filtering to identify BDI-II symptoms in user text. While query expansion and filtering
aimed to improve retrieval, the baseline model without expansions performed best on most ranking
metrics. This suggests that adding heuristic expansions and filters may introduce noise and reduced
overall ranking quality. However, filtering to emphasize self-references helped increase the number of
relevant results at top ranks.</p>
      <p>Our study was limited to five configurations, which constrains detailed understanding of each
component’s contribution. This is because our internal evaluations were performed on a smaller labeled
dataset, where some symptoms were underrepresented. The oficial competition results, which are
more robust, were also available only for the five submitted configurations.</p>
      <p>Future work should focus on several key areas. A thorough ablation study is needed to isolate the
efects of query expansion and diferent filtering methods, addressing the limitations of our current
evaluations. In addition, query expansion could be improved by curating higher-quality phrases through
qualitative analysis and incorporating more diverse data sources to better capture varied symptom
expressions. Considering sentence context rather than treating sentences in isolation may help better
reflect user intent and improve consistency. Training symptom-specific classifiers on labeled data that
integrate first-person detection directly into the model could further enhance precision beyond semantic
similarity. Finally, exploring the use of large language models to generate candidate queries or symptom
expressions, despite the higher computational cost, is another promising direction.
R-PREC</p>
      <p>NDCG
R-PREC</p>
      <p>NDCG</p>
    </sec>
    <sec id="sec-7">
      <title>7. Acknowledgments</title>
      <p>This work was funded in part by the Israeli Science Foundation grant no. 1302/21.</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>The authors used generative AI tools to assist with grammar refinement and phrasing corrections
throughout the writing process.
with multi-task optimization, in: European Conference on Information Retrieval, Springer, 2022,
pp. 3–12.
[17] B. Aklouche, I. Bounhas, Y. Slimani, Query expansion based on nlp and word embeddings., in:</p>
      <p>TREC, 2018.
[18] H. Steck, C. Ekanadham, N. Kallus, Is cosine-similarity of embeddings really about similarity?, in:</p>
      <p>Companion Proceedings of the ACM Web Conference 2024, 2024, pp. 887–890.
[19] J. Guo, Y. Fan, L. Pang, L. Yang, Q. Ai, H. Zamani, C. Wu, W. B. Croft, X. Cheng, A deep look into
neural ranking models for information retrieval, Information Processing &amp; Management 57 (2020)
102067.
[20] Y. Chae, T. Davidson, Large language models for text classification: From zero-shot learning to
ifne-tuning, Open Science Foundation 10 (2023).
[21] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever, et al., Language models are
unsupervised multitask learners, OpenAI blog 1 (2019) 9.
[22] B. Ding, C. Qin, R. Zhao, T. Luo, X. Li, G. Chen, W. Xia, J. Hu, L. A. Tuan, S. Joty, Data augmentation
using llms: Data perspectives, learning paradigms and challenges, in: Findings of the Association
for Computational Linguistics ACL 2024, 2024, pp. 1679–1705.
[23] K. S. Kumar, S. Srivastava, S. Paswan, A. S. Dutta, et al., Depression-symptoms, causes, medications
and therapies, The Pharma Innovation 1 (2012) 37.
[24] J. LeMoult, I. H. Gotlib, Depression: A cognitive perspective, Clinical psychology review 69 (2019)
51–66.
[25] L. S. Goldman, N. H. Nielsen, H. C. Champion, A. M. A. Council on Scientific Afairs, Awareness,
diagnosis, and treatment of depression, Journal of general internal medicine 14 (1999) 569–580.
[26] T. Kojima, S. S. Gu, M. Reid, Y. Matsuo, Y. Iwasawa, Large language models are zero-shot reasoners,
2023. URL: https://arxiv.org/abs/2205.11916. arXiv:2205.11916.
[27] Anthropic, Claude language model (version 3.7), https://www.anthropic.com/claude, 2023.
Accessed on [date].
[28] V. Pavlu, J. Aslam, A practical sampling strategy for eficient retrieval evaluation, College of</p>
      <p>Computer and Information Science, Northeastern University (2007).
[29] A. Barachanou, F. Tsalakanidou, S. Papadopoulos, Rebecca at erisk 2024: search for symptoms of
depression using sentence embeddings and prompt-based filtering, Working Notes of CLEF (2024)
9–12.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Parapar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Perez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Crestani</surname>
          </string-name>
          , Overview of erisk 2025:
          <article-title>Early risk prediction on the internet, in: Experimental IR Meets Multilinguality</article-title>
          , Multimodality, and Interaction - 16th
          <source>International Conference of the CLEF Association, CLEF</source>
          <year>2025</year>
          , Madrid, Spain, September 9-
          <issue>12</issue>
          ,
          <year>2025</year>
          , Proceedings,
          <string-name>
            <surname>Part</surname>
            <given-names>II</given-names>
          </string-name>
          , volume To be
          <source>published of Lecture Notes in Computer Science</source>
          , Springer,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Parapar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Perez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Crestani</surname>
          </string-name>
          , Overview of erisk 2025:
          <article-title>Early risk prediction on the internet (extended overview)</article-title>
          ,
          <source>in: Working Notes of the Conference and Labs of the Evaluation Forum (CLEF</source>
          <year>2025</year>
          ), Madrid, Spain,
          <fpage>9</fpage>
          -
          <issue>12</issue>
          <year>September</year>
          ,
          <year>2025</year>
          , volume To be published of CEUR Workshop Proceedings, CEUR-WS.org,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A. T.</given-names>
            <surname>Beck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. A.</given-names>
            <surname>Steer</surname>
          </string-name>
          , G. Brown,
          <article-title>Beck depression inventory-ii, Psychological assessment (</article-title>
          <year>1996</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>K.</given-names>
            <surname>Mao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Yuqi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Jie</surname>
          </string-name>
          ,
          <article-title>A systematic review on automated clinical depression diagnosis</article-title>
          .
          <source>npj mental health research</source>
          ,
          <volume>2</volume>
          (
          <issue>1</issue>
          ),
          <fpage>20</fpage>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ophir</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Tikochinski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. S.</given-names>
            <surname>Asterhan</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Sisso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Reichart</surname>
          </string-name>
          ,
          <article-title>Deep neural networks detect suicide risk from textual facebook posts</article-title>
          ,
          <source>Scientific reports 10</source>
          (
          <year>2020</year>
          )
          <fpage>16685</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M. R.</given-names>
            <surname>Islam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Kabir</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ahmed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. R. M.</given-names>
            <surname>Kamal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ulhaq</surname>
          </string-name>
          ,
          <article-title>Depression detection from social network data using machine learning techniques</article-title>
          ,
          <source>Health information science and systems 6</source>
          (
          <year>2018</year>
          )
          <fpage>1</fpage>
          -
          <lpage>12</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>T.</given-names>
            <surname>Edwards</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. S.</given-names>
            <surname>Holtzman</surname>
          </string-name>
          ,
          <article-title>A meta-analysis of correlations between depression and first person singular pronoun use</article-title>
          ,
          <source>Journal of Research in Personality 68</source>
          (
          <year>2017</year>
          )
          <fpage>63</fpage>
          -
          <lpage>68</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>X.</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. A.</given-names>
            <surname>Burkhardt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. A.</given-names>
            <surname>Areán</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. D.</given-names>
            <surname>Hull</surname>
          </string-name>
          , T. Cohen,
          <article-title>Deep representations of first-person pronouns for prediction of depression symptom severity</article-title>
          ,
          <source>in: AMIA Annual Symposium Proceedings</source>
          , volume
          <year>2023</year>
          ,
          <year>2024</year>
          , p.
          <fpage>1226</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>N.</given-names>
            <surname>Reimers</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Gurevych</surname>
          </string-name>
          ,
          <article-title>Sentence-bert: Sentence embeddings using siamese bert-networks</article-title>
          ,
          <source>in: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics</source>
          ,
          <year>2019</year>
          . URL: https://arxiv.org/abs/
          <year>1908</year>
          .10084.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Ettinger</surname>
          </string-name>
          ,
          <article-title>What bert is not: Lessons from a new suite of psycholinguistic diagnostics for language models</article-title>
          ,
          <source>Transactions of the Association for Computational Linguistics</source>
          <volume>8</volume>
          (
          <year>2020</year>
          )
          <fpage>34</fpage>
          -
          <lpage>48</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>J.</given-names>
            <surname>Turton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Vinson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. E.</given-names>
            <surname>Smith</surname>
          </string-name>
          ,
          <article-title>Deriving contextualised semantic features from bert (and other transformer model) embeddings</article-title>
          , arXiv preprint arXiv:
          <year>2012</year>
          .
          <volume>15353</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaswani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Parmar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Uszkoreit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Gomez</surname>
          </string-name>
          , Ł. Kaiser,
          <string-name>
            <surname>I. Polosukhin</surname>
          </string-name>
          ,
          <article-title>Attention is all you need</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>30</volume>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , Bert:
          <article-title>Pre-training of deep bidirectional transformers for language understanding, in: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers</article-title>
          ),
          <year>2019</year>
          , pp.
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>K.</given-names>
            <surname>Ethayarajh</surname>
          </string-name>
          ,
          <article-title>How contextual are contextualized word representations? comparing the geometry of bert, elmo, and gpt-2 embeddings</article-title>
          , arXiv preprint arXiv:
          <year>1909</year>
          .
          <volume>00512</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bialer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Izmaylov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Segal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Tsur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Levi-Belz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Gal</surname>
          </string-name>
          ,
          <article-title>Detecting suicide risk in online counseling services: A study in a low-resource language</article-title>
          ,
          <source>arXiv preprint arXiv:2209.04830</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>A.</given-names>
            <surname>Abolghasemi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Verberne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Azzopardi</surname>
          </string-name>
          ,
          <article-title>Improving bert-based query-by-document retrieval</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>