1. Introduction

UET@eRisk2025: Severity Estimation for Depression Symptoms Searching and Early Risk Detection

Tu-Phuong Mai

Minh-Ha H. Le

Duc-Luong Tran

Duy-Cat Can

Hoang-Quynh Le

0 0 VNU University of Engineering and Technology , 144 Xuan Thuy, Cau Giay, Hanoi , Vietnam

In this working note, we describe our participation in Task 1 and Task 2 of the CLEF eRisk 2025 Lab, which focuses on the early detection of depression based on Reddit user-generated content. For Task 1, which involves ranking up to 1,000 sentences according to their relevance to each of the 21 BDI-II depressive symptoms, we combined symptom classification with two approaches: (i) semantic similarity-based, where clustering techniques to group and rank sentences based on their relevance to specific depressive symptoms; and (ii) machine learning-based, where we use the output scores from a model fine-tuned for symptom detection and directly rank sentences based on predicted relevance scores. For Task 2, which targets early detection of depression within multi-user conversations, we design a multi-stage architecture that performs sentence-level symptom and severity detection, aggregates these signals at the post level, and finally estimates depression risk at the conversation level. This layered structure allows the model to capture both localized symptom cues and broader conversational patterns.

eol>Depression Symptoms Searching Early Risk Detection Natural Language Processing Social Media

1. Introduction

The CLEF eRisk 2025 Lab [ 1, 2 ] focuses on the early detection of mental health risks through the analysis of online user-generated content. Competition promotes the development of natural language processing (NLP) systems that are capable of identifying early signs of depression based on social media text. The data used in eRisk tasks is collected from the Reddit platform, where users share personal experiences through posts or discussions. These environments often encourage openness and anonymity, resulting in large volumes of natural language data that reflect individuals’ thoughts, emotions, and behaviors. This year, eRisk 2025 features three tasks: (1) sentence ranking for depression symptoms, based on the 21 symptoms from the Beck Depression Inventory-II (BDI-II) questionnaire [ 3 ]; (2) contextualized early detection of depression, using full, multi-user conversational threads presented in chronological order; and (3) a pilot task involving the detection of depression in LLM-powered conversational agents, where systems must infer the mental state of a simulated user. Together, these tasks aim to support the development of practical and scalable methods for mental health monitoring and early intervention.

Task 1 continues the setup from the eRisk 2024 challenge [ 1 ], several teams employed retrieval-based approaches by ranking user-generated content based on its cosine similarity to the Beck Depression Inventory-II (BDI-II) questionnaire [ 3 ]. Among them, the NUS-IDS team [ 4 ] achieved top performance by leveraging ensemble learning and contrastive fine-tuning. Their system combined sentence-transformer models fine-tuned on task-specific data with expressive exemplars generated via prompting GPT4 [ 5 ], incorporating both BDI symptoms and features from the Early Maladaptive Schemas (EMS) taxonomy [ 6 ]. Task 2 continues the setup from eRisk 2022 [ 7 ], where the NLPGroup-IISERB team [ 8 ] attained top performance using entropy-based bag-of-words features combined with an SVM classifier.

Their approach demonstrated that traditional feature engineering, when carefully designed, can remain competitive for early risk detection.

Our team participated in Task 1 and Task 2 of the CLEF eRisk 2025 Lab. We leverage DepRoBERTa [ 9 ], a RoBERTa-based [ 10 ] model pre-trained for depression detection, in both tasks to filter out irrelevant sentences that do not reflect depressive content. In Task 1, after identifying relevant sentences using the filtering model, we adopt two approaches: (i) a semantic similarity-based method, where we cluster sentence embeddings to group semantically similar expressions for each symptom and rank sentences based on their distance to cluster centroids; and (ii) a machine learning-based method, where we use the output scores from a multi-task DepRoBERTa model fine-tuned for symptom detection and directly rank sentences based on predicted relevance scores. For Task 2, we proposed a multi-stage framework that ifrst used the same filtering model, which was further utilized to produce sentence-level embeddings, which were used to detect symptom presence and estimate severity. Then aggregates this information at the post and conversation levels to estimate depression risk, integrating both local and contextual cues for early mental health detection.

2. Dataset 2.1. Task 1: Search for Symptoms of Depression

This task focuses on ranking documents relevant to symptoms of depression as outlined in the BDI-II questionnaire. The goal is to produce ranked lists containing up to 1,000 of the most relevant sentences for each specific symptom. Evaluation involved expert annotators labeling pooled candidate sentences as relevant if they addressed the symptom and reflected the individual’s state, with context provided for accuracy. The final relevance scores were determined using two approaches: majority voting, where a sentence is marked relevant if most assessors agree, and unanimity, where all assessors must agree on relevance. These methods ensure reliable and consistent evaluation for training and testing.

The training data for Task 1 was provided from previous editions of the same task, specifically from eRisk 2023 and eRisk 2024. The test set for this year includes data collected from 9,000 Reddit users, comprising over 17 million sentences. The data is formatted according to the TREC format. The main statistics1 of the corpus are presented in Table 1.

2.2. Task 2: Contextualized Early Detection of Depression

This task focuses on the early detection of depression by analyzing full conversational contexts. Unlike previous tasks that consider isolated user posts, this task processes interactions among all participants in a conversation sequentially, reflecting real-world social media dynamics. The dataset includes the target user’s writing history and all comments from conversation members, enabling timely depression detection based on evolving dialogue.

The dataset follows the format described in Losada & Crestani (2016)[ 11 ] and consists of Reddit conversations where each conversation forms a tree-structured thread centered around a target user. The objective is to predict a depression score ∈ [ 0, 1 ] for the target user based on contextual signals from the conversation. 1Statistics of the training set are based on reports from the eRisk 2023 and eRisk 2024.

3. Proposed Method 3.1. Task 1: Search for Symptoms of Depression

Our approach to Task 1 is based on two directions: (i) a semantic similarity-based ranking pipeline and (ii) a machine learning-based ranking model.

Semantic similarity-based approach. This direction first uses a multi-label classification model to filter out irrelevant sentences. The remaining relevant sentences are embedded using sentence transformers and grouped into symptom-specific clusters. At inference, test sentences are ranked based on their similarity to these clusters. We explore three configurations: (a) direct semantic similarity, (b) embeddings fine-tuned via contrastive learning, and (c) an ensemble of multiple embedding models. Machine learning-based approach. In this direction, we directly use the output scores from a ifne-tuned multi-task model (described in Section 3.1.3) to rank sentences by relevance.

3.1.1. Pre-processing

We began with the oficial sentence-level annotations provided in TREC format, where each sentence is associated with a user ID and timestamp. To ensure high-quality input and reduce noise, we applied several filtering steps. Texts were lowercased, and non-linguistic tokens such as URLs, emojis, and special characters were removed. Crucially, we filtered for first-person expressions by detecting firstperson pronouns (e.g., “I”, “me”, “my”, ...), under the hypothesis that self-reported experiences better reflect the user’s mental state than statements about others or general opinions. The resulting dataset included relevant sentences from the 2023 and 2024 editions of eRisk, which were used for model training and clustering.

3.1.2. Semantic similarity-based approach

We combine filtering with clustering to identify semantically representative symptom expressions. First, sentences are filtered using a DepRoBERTa-based multi-label classifier to retain only those relevant to any of the 21 BDI-II symptoms. After filtering for relevant sentences, we group them into semantic clusters and rank new sentences based on their distance to these the nearest centroid. This enables the system to identify symptom-relevant sentences that may express depressive cues in more varied ways.

Clustering and Semantic Representation.

To capture variations in how each symptom is linguistically expressed, we performed clustering over the relevant training sentences. Each sentence was embedded using a Sentence Transformer [ 12 ] model - specifically the nomic-embed-text-v1.5 [ 13 ] - to obtain a − vector representation:

v = Embed() ∈ R to form clusters:

For each symptom ∈ {1, . . . , } (with = 21 for BDI-II symptoms), we collected the subset of training embeddings {v()} relevant to that symptom. Then, we applied K-means clustering to this set { (1), . . . , ()} = KMeans {v } () )︁ where () denotes the centroid of the -th cluster for symptom .

This clustering strategy groups semantically similar sentences into coherent sub-themes within each symptom category. The choice of could be made to balance between intra-cluster similarity and inter-cluster diversity. ︁( (1) (2) Contrastive Learning. To improve the discriminative quality of sentence embeddings, we applied contrastive learning using the InfoNCE loss. Each sentence embedding v obtained from the nomic-embed-text-v1.5 model was first projected into a lower-dimensional space via a linear mapping layer: h = · v + b, where ∈ R128× is a trainable weight matrix and is the original embedding dimension.

Given a batch of training samples with known symptom labels, positive pairs were constructed from sentences annotated with the same symptom, and negatives from sentences belonging to diferent symptoms. The InfoNCE loss was then applied to pull embeddings of similar sentences closer and push dissimilar ones apart: ℒℎ = − log ∑︀∈() exp(sim(hi, ha)/ )

∑︀∈ () exp(sim(hi, hp)/ ) Where: Where: • () = { ̸= | = }: Set of positive indices with the same label as anchor. • () = { ̸= }: Set or all samples.

• sim(· , · ) denotes cosine similarity and is a temperature hyperparameter.

This training objective encourages the embedding space to reflect symptom-level semantic distinctions more clearly, enhancing the quality of downstream clustering and similarity-based ranking. Sentence Assignment and Ranking. Each test sentence predicted as relevant was also embedded using the same nomic model. Then, for each symptom, we applied -nearest neighbor search ( = 11) to identify the closest cluster centroid (among training clusters of that symptom). We assigned each test sentence to the nearest cluster and computed its distance to the centroid. This distance was converted to a normalized similarity score via: (3) (4) (5) Similarity() = 1 −

‖v − ‖2 max(‖v − ‖2) ∈ • v is the embedding vector of test sentence . • is the nearest cluster centroid for symptom .

• is the set of all test sentences predicted as relevant to symptom .

The final ranking was derived by sorting all test sentences for each symptom in descending order of similarity, selecting the top 1,000 as the system output.

This approach combines high-precision filtering from the symptom classifier with semantic granularity from clustering, enabling the system to surface sentences that are not only relevant to a symptom but also representative of its most prototypical or central expressions.

3.1.3. Machine Learning-based approach

Given that the severity of a sentence often correlates with the presence and intensity of specific depressive symptoms, we adopt a multi-task learning approach to jointly model both aspects. Specifically, we fine-tune a DepRoBERTa [ 9 ] model to simultaneously predict symptom presence (as a 21-dimensional multi-label output) and estimate severity (as a continuous score in [ 0, 1 ]). This joint training not only allows the two tasks to benefit from shared representations but also encourages the model to capture subtle linguistic cues that reflect both the type and intensity of depressive expressions. This model takes an individual sentence as input and produces two outputs: a binary vector indicating the presence of relevant symptoms, and a scalar severity score.

Severity Label Generation Using Large Language Models. To create a reliable dataset for

sentence-level symptom detection and severity estimation, we extended the Task 1 training data, which includes annotations for symptom relevance but lacks severity labels, by generating severity scores using a large language model (LLM). For each relevant sentence, we prompted the LLM with the sentence text and corresponding BDI-II symptom descriptions to assign a severity score in {0, 1, 2, 3} based on the BDI-II criteria. These scores were then normalized to a continuous scale in [ 0, 1 ]. This process leverages both the clinical structure of BDI-II and the contextual reasoning capabilities of the LLM to provide consistent and meaningful severity annotations. The resulting dataset, containing both relevance and severity scores, enables supervised training of a multi-task model while avoiding the need for costly manual labeling.

Architecture. Figure 1 illustrates the architecture of our multi-task fine-tuned DepRoBERTa model for sentence-level symptoms detection and severity estimation. The architecture consists of: • Shared Backbone: The first 18 layers of the pre-trained DepRoBERTa model are frozen during training. • Branch 1 — Symptom Detection: A task-specific branch with 6 transformer layers and a pooler, followed by a multi-label classification head to predict the relevance of a sentence to 21 depression-related symptoms. • Branch 2 — Severity Estimation: Another 6-layer branch with a pooler. The pooled vector from this branch is concatenated with the pooled output from Branch 1, then passed through a linear mapping layer to combine features. The resulting vector is used both for computing contrastive loss and as input to a regression MLP head that outputs the severity score ∈ [ 0, 1 ].

Training Strategy.

We employ a two-phase training procedure: 1. Phase 1: Train the symptom detection branch while freezing the severity estimation branch. 2. Phase 2: Once Branch 1 stabilizes, we freeze it and start training Branch 2.

Loss Functions. The model is optimized using a combination of three loss components: ℒtotal = ℒBCE + ℒMSE + · ℒ InfoCL (6) • ℒBCE: Binary cross-entropy loss for multi-label classification. • ℒMSE: Mean squared error for severity score regression. • ℒInfoCL: Contrastive loss (InfoNCE [ 14 ]) applied on the pooled sentence embeddings from Branch 2 to improve representation quality.

• : A weighting factor to balance the contrastive loss.

Sentence Ranking Given a post or comment consisting of sentences: = {1, 2, . . . , }, each sentence is passed through a multi-task model: sym() = z ∈ {0, 1}21, sev() = ∈ [ 0, 1 ] (7) where z is a binary symptom vector for the 21 depressive symptoms, and is the predicted severity score if is relevant.

For each symptom ∈ {1, . . . , 21}, we rank all sentences by their predicted probability [] in descending order and select the top 1000 sentences. This method directly uses the model’s outputs to perform sentence ranking and was used in our best-performing configuration (Run 4).

3.1.4. Submitted Configurations

We submitted five configurations for Task 1, described as follows: Run 0: Similarity. Semantic similarity-based approach using the original nomic-embed-text-v1.5 model without contrastive learning. K-means clustering was applied with = 11 to form symptom-specific clusters.

Run 1: Ensemble Similarity. An ensemble of three cluster-based similarity runs: (i) nomic-embed-text-v1.5 with = 5, (ii) nomic-embed-text-v1.5 with = 11, and (iii) modernbert-embed-base with = 11.

Run 2: Contrastive Learning Similar to Run 0 but using contrastive learning, fine-tuned nomic-embed-text-v1.5 embeddings. Embeddings were projected to a 128-dimensional space and trained using InfoNCE loss to improve symptom-level semantic separation.

Run 3: Ensemble Contrastive Learning. An ensemble combining Run 1 and Run 2, leveraging both diverse embedding sources and contrastive learning enhanced representations for more robust similarity ranking.

Run 4: Machine Learning. A machine learning-based approach using output scores from the ifne-tuned multi-task model described in Section 3.1.3. This model directly predicts symptom relevance and severity, and its scores are used to rank the sentences.

3.2. Task 2: Contextualized Early Detection of Depression

Our pipeline consists of three stages: 1. Sentence-level symptom detection and severity estimation: Each sentence is analyzed to identify the presence of depressive symptoms and to assign a fine-grained severity score. This stage uses a multi-task model, which is described in Section 3.1.3. 2. Post-level depression scoring: Relevant sentence representations and their associated severity scores are aggregated to compute a depression score for each post or comment. 3. User-level depression estimation: Finally, a set of rule-based heuristics is applied to combine post-level scores across the conversation tree, yielding a final depression score for the target user.

3.2.1. Pre-processing

We adopt the dataset released by the organizers, which filters out noisy or incomplete threads. To preserve relevant contextual information, each conversation tree is pruned to keep only the branches: • leads to the target user (i.e., ancestor nodes), • or are direct responses from the target user (i.e., children nodes).

In cases where parent nodes are missing, dummy nodes are inserted to maintain the tree structure and avoid losing conversation branches.

3.2.2. Post-level Depression Scoring

In this stage, we aggregate sentence-level information to estimate a depression score for each post or comment. Let { }=1 denote the set of relevant sentences extracted from a given post, each associated with a severity score ∈ [ 0, 1 ].

Each sentence is encoded using the fine-tuned DepRoBERTa model and extracts the sentence representation from the Pooler layer of Branch 2.

The final representation is obtained by concatenating the textual and severity embeddings, followed by a multi-layer perceptron (MLP) with a sigmoid activation to produce the depression score ˆ ∈ [ 0, 1 ]: ˆ = (MLP([htext; hsev])) (11)

The sequence of sentence embeddings {h }=1 is passed through a bidirectional LSTM to capture contextual dependencies among sentences:

Similarly, the sequence of scalar severity scores { }=1 is fed into a separate BiLSTM to capture the temporal structure and progression of severity:

h = Pooler(DepRoBERTa( )) htext = BiLSTMtext({h }=1) ∈ R hsev = BiLSTMsev({ }=1) ∈ R

3.2.3. User-level Depression Prediction

In the final stage, we aggregate sentence-level severity scores to produce a depression prediction for the target user. Given the severity scores of relevant posts across a conversation tree, we implement multiple rule-based configurations to explore diferent aggregation strategies. Each configuration defines specific rules for decision making: Run 0: Target Node Only, Max Score We consider only the posts authored by the target user (target nodes) and take the maximum severity score as the final prediction: ˆ = max (score()) ∈target Where: • target is the set of posts authored by the target user in the current conversation. • score() denotes the predicted severity score (in [ 0, 1 ]) for post . (8) (9) (10) (12)

Run 1: Temporal Accumulation, Target Nodes Only

We include historical posts of the target user and again use the maximum score across all such posts: ˆ =

max ∈{ht∪ct}

(score()) • ct is the set of posts authored by the target user in the current conversation. • ht is the set of posts authored by the same user in previous conversations.

• score() denotes the predicted severity score (in [ 0, 1 ]) for post .

Run 2: Temporal Accumulation with Bonus Similar to run 1, but we add a bonus score if the maximum severity exceeds a high threshold high. The final score is computed as: ˆ =

max ∈{ht∪ct} (score()) + bonus · L ︂[

max ∈{ht∪ct} (score()) > high ︂] (13) (14) (15) (16) • is the original severity score of post . • parent() is the score of the parent node of in the conversation tree. • root is the score of the root node of the conversation. • low is the low depression score threshold. • high is the high depression score threshold. • is the weight for the parent node influence.

• is the weight for the root node influence.

The final decision score is the maximum of all adjusted scores: ˆ = ∈target

max (′) • ct is the set of posts authored by the target user in the current conversation. • ht is the set of posts authored by the same user in previous conversations. • score() is the predicted severity score (in [ 0, 1 ]) for post . • high is the high depression score threshold. • bonus is the bonus term added to the final score when the threshold condition is met. • L[· ] is the indicator function that returns 1 if the condition inside is true, otherwise 0.

Run 3: Temporal Accumulation with Neighbor-based Uncertainty Handling

We consider both current and historical posts from the target user. For each post ∈ target, if its severity score falls within an uncertainty range [ low, high], we apply a neighbor-based adjustment using its parent and root scores. The adjusted score ′ is defined as: {︃(1 − ) · + · parent() + · root if low < ≤ high otherwise

Run 4: Temporal Accumulation with Community-based Adjustment We accumulate both

current and historical posts from the target user. For each target post ∈ all target, we consider all posts in the conversation branch from the root node to , excluding itself, as:

Let be the original severity score of , and be the average severity score of the community: = { ∈ Branch(, ) | ̸= } = 1

∑︁ || ∈ ⎧⎪ + bonus · ( − ) ⎨

if > max( high, ) The final prediction score is the maximum over all adjusted scores: ˆ =

max (′) ∈target • is the original severity score of post . • high, low are the high depression score and low depression score threshold. • bonus, penalty are the bonus term added to the final score and the penalty term subtracted from the final score when the threshold condition is met.

Each run serves as a configuration of the decision logic and can be evaluated independently to assess the robustness of rule-based aggregation methods over tree-structured social media conversations.

4. Evaluation Results & Discussion 4.1. Task 1: Search for Symptoms of Depression

A total of 67 runs from participants were submitted for this task. In Table 2, we present the rankingbased evaluation results for Task 1 (majority setting), comparing the best configuration from each participating team. Our submission achieved the second-best performance in both NDCG and AP, while also maintaining strong results across R-PREC and P@10. This demonstrates that our approach ofers a well-balanced trade-of between ranking quality and precision.

Tables 3 show the ranking-based performance of our system under majority voting schemes. Among our runs, the machine learning configuration consistently achieves the best results, notably with an NDCG of 0.623 (majority) and 0.577 (unanimity), highlighting its efectiveness. Similarity-based approaches also perform reasonably well, with slight improvements when ensembling is applied. In contrast, contrastive learning methods underperform across all metrics, suggesting they may not be well-suited for this task without further tuning.

4.2. Task 2: Contextualized Early Detection of Depression

We report the results of our models on the public leaderboard in Table 4. Among our runs, Run 2: Temporal Accumulation with Bonus consistently yields the best performance, with an latency of 0.68 and 1 of 0.73, demonstrating the benefit of incorporating historical context and severity-based reward.

A total of 50 runs from 12 participants were submitted for this task. In Table 5, we present the decision-based evaluation results for Task 2, comparing the best configuration from each participating team. Our submission achieved the fourth-best performance in both 1 and latency. (17) (18) (19) (20)

NDCG

NDCG 0.378 0.390 0.258 0.228 0.394

5. Conclusion

In this report, we present our approaches for both Task 1 and Task 2 of the eRisk 2025 challenge, focusing on early detection and symptom identification of depression from social media posts. For Task 1, we explored various ranking-based methods based on two approaches: (i) Semantic similaritybased methods that cluster sentence embeddings and rank by proximity to symptom centroids, and (ii) ERDE5 Machine learning-based methods that directly use the output scores from the multi-task model. Among these, the second approach achieved the best performance across evaluation metrics, demonstrating the efectiveness of our fine-tuned multi-task model for sentence-level symptom detection.

For Task 2, we designed several temporal aggregation strategies to detect early warning signs of depression. These configurations leverage both current and historical user data, with enhancements such as uncertainty handling and community-based score adjustment. The most efective setup integrated severity scoring with threshold-based boosting, resulting in competitive latency-aware performance.

Across both tasks, we incorporated the multi-task model to sentence ranking in Task 1 and provided representations while filtering out irrelevant content in Task 2, contributing to improved robustness and precision. Overall, our approaches highlight the efectiveness of combining fine-tuned language models with task-specific heuristics and temporal context for early detection of mental health risks.

Declaration on Generative AI

In the preparation of this report, we only used Grammarly and ChatGPT for spell/grammar checking and improving the readability of the manuscript. No part of the content, analyses, or results was generated by AI tools. All methodological design, implementation, experiments, and interpretations were conducted solely by the authors.

[1]

Parapar ,

Perez ,

Wang ,

Crestani , Overview of erisk 2025: Early risk prediction on the internet, in: Experimental IR Meets Multilinguality , Multimodality, and Interaction - 16th International Conference of the CLEF Association, CLEF 2025 , Madrid, Spain, September 9- 12 , 2025 , Proceedings, Part

, volume To be published of Lecture Notes in Computer Science , Springer, 2025 .

[2]

Parapar ,

Perez ,

Wang ,

Crestani , Overview of erisk 2025: Early risk prediction on the internet (extended overview) , in: Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2025 ), Madrid, Spain, 9 - 12 September , 2025 , volume To be published of CEUR Workshop Proceedings, CEUR-WS.org, 2025 .

[3]

A. T.

Beck , Beck depression inventory-ii, Psychological assessment ( 1996 ).

[4]

B. H.

Ang ,

S. D.

Gollapalli , S. - K. Ng, Nus-ids@ erisk2024: ranking sentences for depression symptoms using early maladaptive schemas and ensembles , Working Notes of CLEF ( 2024 ) 9 - 12 .

[5]

Achiam ,

Adler ,

Agarwal ,

Ahmad ,

Akkaya ,

F. L.

Aleman ,

Almeida ,

Altenschmidt ,

Altman ,

Anadkat , et al., Gpt-4 technical report, arXiv preprint arXiv:2303.08774 ( 2023 ).

[6] J. E. Young, Cognitive therapy for personality disorders: A schema-focused approach , Professional Resource Press/Professional Resource Exchange, 1999 .

[7]

Parapar ,

Martín-Rodilla ,

D. E.

Losada ,

Crestani , erisk 2022 : pathological gambling, depression, and eating disorder challenges , in: European Conference on Information Retrieval , Springer, 2022 , pp. 436 - 442 .

[8]

Srivastava ,

Lijin ,

Sruthi , T. Basu, Nlp-iiserb@ erisk2022: Exploring the potential of bag of words, document embeddings and transformer based framework for early prediction of eating disorder, depression and pathological gambling over social media ., in: CLEF (Working Notes) , 2022 , pp. 972 - 986 .

[9]

Poświata , M. Perełkiewicz, OPI@LT-EDI-ACL2022: Detecting signs of depression from social media text using RoBERTa pre-trained language models , in: Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion , Association for Computational Linguistics, Dublin, Ireland, 2022 , pp. 276 - 282 . URL: https://aclanthology.org/ 2022 .ltedi- 1 .40. doi: 10 .18653/v1/ 2022 .ltedi- 1 . 40 .

[10]

Liu ,

Ott ,

Goyal ,

Du ,

Joshi ,

Chen ,

Levy ,

Lewis ,

Zettlemoyer ,

Stoyanov , Roberta: A robustly optimized bert pretraining approach , arXiv preprint arXiv: 1907 . 11692 ( 2019 ).

[11]

Losada ,

Crestani , A test collection for research on depression and language use , volume 9822 , 2016 , pp. 28 - 39 . doi: 10 .1007/978-3- 319 -44564- 9 _ 3 .

[12]

Reimers , I. Gurevych , Sentence-bert: Sentence embeddings using siamese bert-networks , in: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics , 2019 . URL: https://arxiv.org/abs/ 1908 .10084.

[13]

Nussbaum ,

J. X.

Morris ,

Duderstadt ,

Mulyar , Nomic embed: Training a reproducible long context text embedder , 2025 . URL: https://arxiv.org/abs/2402.01613. arXiv: 2402 . 01613 .

[14] A. v. d. Oord,

Li ,

Vinyals , Representation learning with contrastive predictive coding , arXiv preprint arXiv: 1807 . 03748 ( 2018 ).