<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>F U-TU-DFKI@eRisk 2025: A Linguistically Informed but Overdiagnosing Approach to Early Depression Detection</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Elif Kara</string-name>
          <email>elif.kara@fu-berlin.de</email>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rosa Esther Martín Peña</string-name>
          <email>rosa-esther.martin-pena@cells.uni-hannover.de</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
          <xref ref-type="aff" rid="aff5">5</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lisa Raithel</string-name>
          <email>raithel@tu-berlin.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
          <xref ref-type="aff" rid="aff6">6</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>BIFOLD - Berlin Institute for the Foundations of Learning and Data</institution>
          ,
          <addr-line>Ernst-Reuter-Platz 7, 10587 Berlin</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>CAIMed - Center for Artificial Intelligence in Medicine, Hannover Medical School (MHH)</institution>
          ,
          <addr-line>Carl-Neuberg-Straße 1, 30625 Hannover</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>CLEF 2025 Working Notes</institution>
          ,
          <addr-line>9 - 12</addr-line>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Deutsches Forschungszentrum für Künstliche Intelligenz GmbH (DFKI), DFKI Labor Berlin</institution>
          ,
          <addr-line>Salzufer 15/16, 10587 Berlin</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>Freie Universität Berlin, Department of Philosophy and Humanities, Institute for English Language and Literature</institution>
          ,
          <addr-line>Habelschwerdter Allee 45, 14195 Berlin</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff5">
          <label>5</label>
          <institution>Gottfried Wilhelm Leibniz Universität Hannover, Centre for Ethics and Law in the Life Sciences</institution>
          ,
          <addr-line>Otto-Brenner-Straße 1, 30159 Hannover</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff6">
          <label>6</label>
          <institution>Technische Universität Berlin, Quality and Usability Lab</institution>
          ,
          <addr-line>Marchstraße 23, 10587 Berlin</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper describes the participation of the FU-TU-DFKI team in the eRisk 2025 Task 2, Contextualized Early Detection of Depression. We propose a hybrid approach that combines transformer-based modelling with linguistic and meta feature analysis. While our model achieved high recall, it exhibited low precision, resulting in an overall 1-score of 0.29 in the oficial evaluation. We interpret this cautious behaviour as a tendency toward overdiagnosis. Beyond the technical system, we investigated the linguistic characteristics of user messages via corpus-linguistic methods, including Collostructional Analysis - a method for identifying statistically significant associations between words and grammatical constructions. Additionally, we examine the ethical implications of automated depression detection, and highlight the reductionist interpretation of complex afective utterances in such systems. Our submission emphasizes the importance of interpretability and caution in high-stakes, health-related NLP tasks, particularly when system performance remains limited.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;mental health</kwd>
        <kwd>depression detection</kwd>
        <kwd>transformer models</kwd>
        <kwd>collostructional analysis</kwd>
        <kwd>corpus linguistics</kwd>
        <kwd>ethical NLP</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Depressive disorder is a serious mental health concern, afecting around 280 million adults worldwide
according to the World Health Organization (WHO).1 The condition has an impact on all phases
and aspects of life, such as relationships, school, or work, making it a major public health concern.
Nevertheless, many cases remain undiagnosed, are self-diagnosed, or are diagnosed only after significant
delays, often resulting in worse outcomes for those afected [
        <xref ref-type="bibr" rid="ref1 ref2 ref8 ref9">1, 2</xref>
        ]. Early detection of depressive
symptoms can enable more timely support and intervention, potentially improving quality of life and
reducing long-term sufering [
        <xref ref-type="bibr" rid="ref10 ref3 ref4">3, 4</xref>
        ]. However, structural and social barriers often prevent individuals
from seeking help. Even in well-funded healthcare systems, access to therapy can be limited [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
Furthermore, mental health stigma remains widespread, making open conversations about psychological
distress dificult for many people [
        <xref ref-type="bibr" rid="ref6 ref7">6, 7, 8</xref>
        ].
      </p>
      <p>As a consequence, many individuals turn to social media platforms such as Reddit to express their
thoughts, connect with others facing similar struggles, or seek informal advice [9]. The anonymity
ofered by these platforms allows users to share personal experiences more openly than they might in
ofline contexts [ 10]. This makes social media a valuable, albeit noisy, source of linguistic and emotional
data. Since most interactions on these platforms are text-based, language plays a central role in the way
emotions and psychological states are communicated. As more and more patients express themselves
online, there is growing interest in using Natural Language Processing (NLP) techniques to detect
patterns and markers associated with the condition in large-scale text data [11].</p>
      <p>All of these challenges are at the core of this year’s eRisk 2025 workshop [12, 13], specifically Task 2,
Contextualized Early Detection of Depression. In this paper, we present our system for this task which
includes the following contributions:
• two corpus-linguistic pilot studies, including collostructional methods, on lexical characteristics of
the training data to identify patterns and markers associated with depressive language (Section 3);
• a hybrid pipeline that combines a transformer-based prediction model (MentalBERT), handcrafted
linguistic features, and contextual meta-information (Section 4); and
• a brief reflection on the ethical implications of applying NLP to social media data for mental
health research (Section 5).</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>Over the past decade, NLP has emerged as a powerful tool for studying mental health through language.
A growing body of work has focused on extracting linguistic signals related to psychological wellbeing
[14, 15], stress [16], anxiety [17, 18], schizophrenia [19, 20] and depression [17, 21, 22]. Among these,
depression detection remains one of the most extensively studied applications. Shared tasks such as
CLPsych and eRisk have driven methodological advances by providing annotated datasets and realistic
evaluation settings for early detection [23, 24]. Systems developed for these tasks have explored a wide
range of techniques, including keyword-based lexica, topic modelling [25], psycholinguistic feature
extraction [26], and (deep) machine learning approaches, such as XGBoost or CNNs [27, 28].</p>
      <p>Depression has also received particular attention from a linguistic perspective. Studies investigating
linguistic markers of psychological distress have consistently reported correlations between depression
and specific features, such as the frequency of first-person singular pronouns (FPSPs) [ 29, 30, 31],
negatively valenced words [32, 33, 34], absolutist language [35], and a preference for past-tense verbs
[36]. Among these, FPSP frequency has emerged as a particularly robust marker of depression, as found
by a meta-analysis [37], and frequently reported both across analytical approaches [38, 39, 31] and
languages [39, 31, 40]. This observation aligns with psychological theories positing that depression
is associated with maladaptive self-focused attention schemas [41]. These studies demonstrate that
linguistically grounded features, when integrated into NLP models, ofer a scalable and transparent
means of extracting mental health signals from user-generated content. In addition, they make it
possible to monitor language use longitudinally and to identify early indicators of depression – even
in the absence of explicit self-disclosure. However, challenges remain, particularly in achieving high
precision and interpretability, and in addressing the (ethical) complexities of real-world deployment.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Linguistic Analysis</title>
      <p>Dataset Both pilot studies were conducted on the oficial eRisk 2025 training data, which combines
data from previous eRisk challenges in 2017, 2018 and 2022. It consists of the full conversational
history of individual Reddit users, divided into a target group (henceforth POS) comprising depressed
users, and a control group (henceforth NEG) comprising non-depressed users. The users in POS were
selected based on statements disclosing a depression diagnosis; all posts containing such statements
were subsequently removed from the dataset. For further information, see Losada and Crestani [42].</p>
      <p>Table 1 provides an overview of the basic statistics of POS and NEG. NEG is roughly ten times larger
than POS. This class imbalance is appropriate for statistical linguistic analysis, as the larger control
group allows for more stable frequency estimates, provided that the NEG data reflects diverse and
representative language use. However, we acknowledge that Reddit does not reflect general population
demographics or mental health prevalence, as discussed in Section 5.</p>
      <p>POS users tend to have a larger posting history than NEG users. Moreover, while sentences overall
are rather short, POS sentences are both longer and more lexically diverse2 than NEG.</p>
      <sec id="sec-3-1">
        <title>3.1. Pilot Study 1: First-Person Singular Pronoun Use</title>
        <p>This study is motivated by FPSPs use being postulated as a robust linguistic marker of depression. First,
we explore the relative distribution of I, followed by verbal associations with I in the two cohorts.
Distribution of FPSPs The lemma form of I – comprising I and me but excluding my and other forms
of self-reference – occurs with a relative frequency of 4.81% (N =267,868) in the POS group and 2.50%
(N =1,344,950) in the NEG group, confirming findings from previous studies. Deviation of Proportions
(DP) was applied as a measure of dispersion.3 POS yields a DP of 0.48 and NEG 0.49, indicating uneven
distribution in both groups, with a minor skew of POS towards greater evenness. Moreover, the FPSP I
is absent from only one depressed user’s (0.32%) posting history, compared to 216 users (7.73%) in the
control group. This supports and strengthens the observation that FPSP usage in the depressed dataset
is not only more frequent but also more evenly distributed, as reflected in a narrower range.
Verbal Associations with I in POS vs NEG To explore why FPSPs are more frequent in the
depression data, we turn to a qualitative investigation of how users with and without depression
predicate states and actions about themselves. We operationalize this as an analysis of verb associations
with the FPSP. To do so, we extract all instances of the construction [I + VERB] – plus optional slots for
one adverb and up to two auxiliary verbs – from both datasets. The lists of verb lemmas are submitted
to two subtypes of CA, implemented via the collostructions R package [44].</p>
        <p>Collostructional Analysis CA, developed by Stefanowitsch and Gries [45] (see also [46, 47])
is a quantitative approach that ofers insights into co-occurrence phenomena at the form-function
interface.4 It has been applied extensively to uncover systematic patterns in how lexical items associate
with grammatical constructions across diferent languages and registers. 5 Thus, this method ofers
insights into both structural properties of language and the cognitive mechanisms underlying its use,
supporting research on mental health and language, as demonstrated in a recent study [50].</p>
        <p>We apply Distinctive Collexeme Analysis (DCA) to measure the association of verbs with [I + VERB]
in POS, and compare them against the association of verbs with the same construction in NEG. This
allows us to determine verbal associations characteristic of each cohort, emphasizing their diferences.
2We used Mean Segmental Type-Token Ratio (MSTTR) to measure lexical diversity as it is insensitive to varying text lengths.
3DP compares the expected distribution of a linguistic unit across corpus segments to the observed distribution, with 0
indicating perfectly even and 1 maximally uneven distribution [43].
4CA is grounded in the theoretical framework of Construction Grammar, which assumes that linguistic units on all levels
(words, morphemes, phrases, and sentences) are form-meaning pairings in the Saussurean sense, called constructions [48, 49].
5E.g., the English ditransitive construction [VERB + NP + NP] strongly attracts verbs of transfer (e.g., give, send, ofer ) while
the caused-motion construction [VERB + NP + PP/AdvP] attracts verbs of placement (e.g., put, place, throw) [45].</p>
        <p>Results Table 3, in Appendix A, displays the verbs most strongly, positively associated with each
dataset (all p&lt;.0001). A glance at the highest-ranking verbs reveals that POS strongly attracts verbs
encoding negative sentiment (e.g., struggle, sufer , cry, lose), mental health-related verbs (e.g., diagnose,
prescribe, hospitalize, misdiagnose), as well as emotion verbs (e.g., feel). In contrast, NEG strongly attracts
verbs with more neutral sentiment (e.g., modify, see, think, mean, watch), indicating a more casual,
conversational tone. A post-hoc sentiment analysis of the [I + VERB] constructions via NLTK’s VADER
[51], using the three polarity labels positive, negative, and neutral, confirms that negative sentiment is
more common among the highest-ranking verb associations in the depressed cohort (18% in POS vs 8%
in NEG). Additionally, we cross-examined the results from the DCA with a Simple Collexeme Analysis
(SCA) that compares verb associations with [I + VERB] against their overall corpus frequency.6 We
did this for each cohort in order to identify intra-cohort associations. Remarkably, nearly all of the
top 100 collexemes are shared between the cohorts, and reflect neutral sentiment (see Table 6). Thus,
although an independent analysis shows both groups are associated with neutral language, a contrastive
approach reveals clear diferences in afective characterization.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Pilot Study 2: Important Concepts</title>
        <p>This study is motivated by two goals: first, to complement NLP methods for concept detection, such as
topic modelling and TF-IDF, and second, to build on the preliminary finding from Pilot Study 1 that
mental health-related verbs are characteristically predicated about the self in the depressed cohort. For
this, we extract all word-form tokens from each cohort and compare the resulting word lists against a
440-million-word subset of the Corpus of Contemporary American English (COCA) [52].7 In addition
to this corpus-level analysis, we conducted KAs for each year in which users posted messages (2009
to 2021), in order to track lexical trends over time. While this does not inform early risk detection, it
serves as a form of cross-validation to identify which concepts recur across years in the POS and NEG
datasets.</p>
        <p>Keyword Analysis Keyword Analysis (KA) is a corpus-linguistic approach to identifying tokens
that are key in a given corpus. Like CA, KA is a transparent statistical method aimed at detecting
words associated with a target corpus; in the case of KA, relative to a larger reference corpus [53].
When applied to highly specialized corpora – for example, comprising depression data – KA can reveal
deviations from the lexical norms and conceptual patterns considered typical for the broader speech
community represented by the control corpus.</p>
        <p>Results Table 6, in Appendix C, displays the top keywords, indicating strong thematic diferences
between the datasets: POS shows an overrepresentation of first-person pronouns, along with afective
(feel, love), mental health-related (depression) and clinical (meds) vocabulary. In contrast, NEG shows
an overrepresentation of words related to financial and transactional discourse ( binance, wallet,
account, cryptocurrency). As expected, COCA keywords reflect more formal and informational language,
comprising function words (the, of, in), including third-person pronouns (his, he), and proper nouns
tied to political discourse (president, national, united). The association with function words reflects
general patterns of natural language use, while the other salient categories likely stem from COCA’s
composition which is biased toward formal, written genres. All of these patterns are also reflected in the
annual KAs: the most prominent keywords per cohort recur consistently across all 13 years. In addition,
some overlap of genre-specific expressions 8 was observed in both POS and NEG, demonstrating how
lexical choices are influenced by multiple factors, which studies on mental health and language must
account for.
6The research designs of both a DCA and an SCA are schematically illustrated in Table 5, in Appendix B.
7The COCA covers eight genres (spoken, fiction, academic, newspapers, etc.) published between 1990 and 2019, and is widely
considered to be a balanced corpus of present-day American English.
8Including second-person pronouns (you), platform- and medium-specific expressions ( reddit, lol, haha, fuck), as well as
conversational markers (thanks).</p>
        <p>linguistic
features
model 
pred.</p>
        <p>meta 
information
historical 
scores</p>
        <p>savemesmcoorrye to
convenreswa tion
extract messages
of target user
aggregate 
score
threshold 
check
positive
negative
decision</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Pipeline</title>
      <p>Our pipeline is based on the predictions of a transformer-based [54] encoder-only model [55], the
linguistic analyses, and metadata, either extracted directly from the incoming messages or inferred from
the broader conversation context. We opted for an encoder-only architecture over a Large Language
Model (LLM) due to its eficiency, interpretability, and compatibility with additional linguistic and meta
features. The pipeline is illustrated in Figure 1.
Model For model predictions, we use MentalBERT [56],9 a BERT-based model [55] which was
continuously pretrained on English Reddit posts related to mental health.10 This domain-specific
pretraining allows the model to better handle informal language and topic-specific expressions. We
ifne-tuned MentalBERT on a balanced subset of our training data for a binary classification task: given
a message, the model predicts whether the user is likely (positive) or not likely (negative) to exhibit
signs of depression. Following hyperparameter tuning, the best configuration achieved an 1-score of
0.63 on the positive class on a held-out validation set, reflecting the dificulty of the task.
Features For each new conversation, we retrieve the target user’s messages in chronological order
and extract linguistically motivated features based on the analyses. Specifically, we scanned each
message for (a) instances where I was followed by a verb11 associated with the POS group within a
ifve-token window (see Pilot Study 1), and (b) for keywords associated with the POS group (see Pilot
Study 2). Additionally, we incorporated a small set of meta-level behavioural indicators: (a) the night
writer feature, which captures how frequently a user posted messages between 11:00 pm and 06:00 am,
motivated by the established link between sleep disturbances and depressive symptoms [57, 58]; and (b)
a sentiment classification pipeline based on a pretrained model from the Huggingface Transformers
library,13 given prior findings of increased negative sentiment in depression [ 59, 60] and in our own
linguistic analyses. Both the linguistic and meta features served to bias the final decision of the system
in case the prediction model was not confident.</p>
      <p>Decision Logic To ensure stability and avoid unreliable decisions based on limited data, the system
waits until a user has posted a suficient number of messages before making a prediction. 14 The final
decision integrates weighted model probabilities, linguistic features, and metadata through a
thresholdbased logic, as outlined in Appendix D. Upon receiving a new batch of messages from a user, we
9https://huggingface.co/mental/mental-bert-base-uncased; MentalBERT was the best model in preliminary experiments.
10A subset of the eRisk18 T1 dataset [42] was included in the model’s training set.
11We applied lemmatization using the spaCy library12 to increase robustness by including inflected forms.
13Using the default model https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english with three labels,
positive, negative, and neutral.
14We set this threshold to 5 messages and require that at least 2 rounds have already been processed.
use top-k (with  = 3) averaging to capture peak signals across their history while reducing noise,
and combining these with historical scores to update the assessment, which is then saved to inform
upcoming predictions.</p>
      <sec id="sec-4-1">
        <title>4.1. Results and Preliminary Analysis of Error Sources</title>
        <p>Table 2 presents the oficial evaluation results of our system, as provided by the task organizers. Due to
hardware issues, our submission processed only 449 user threads out of the intended 1,280. The resulting
1 score is 0.29, which is, unfortunately, to be found within the lower end of the participating teams’
scores. However, the model demonstrated a high recall of 0.97, indicating that it successfully identified
most users with depression. This came at the cost of very low precision (0.17), meaning the system
overdiagnosed users and produced a high number of false positives. The early risk detection error
(ERDE) scores [42] further reflect the system’s cautious behaviour: with a latency   of 11 messages,
the model generally waited for a substantial amount of user data before making a positive prediction.
While this helped avoid premature decisions, it limited the model’s ability to detect depression early.
This is also reflected in the low latency-aware 1-score (0.28), despite an overall speed score of 0.96.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion and Conclusion</title>
      <p>Our system for eRisk 2025 Task 2 sought to balance predictive performance with interpretability by
combining a transformer-based model with linguistically motivated features. While the system achieved
relatively high recall, low precision resulted in an overall 1-score of 0.29. This outcome reflects our
emphasis on avoiding false negatives in a high-risk domain, but also highlights the trade-of between
sensitivity and specificity in early depression detection. Several design choices may have contributed to
these outcomes: we relied on a single model with heuristically selected parameters and no uncertainty
calibration. In future work, we aim to better integrate linguistic and contextual features to improve
transparency and accuracy, as well as to explore ensemble approaches, LLMs, and more adaptive
decision logic.</p>
      <p>Independent of model performance, data limitations pose broader concerns in this domain: social
media data lacks clinical validation, and demographic biases, such as the underrepresentation of older
adults or individuals with limited digital access, reduce generalizability.</p>
      <p>From an ethical perspective, we acknowledge that language-based models in mental health contexts
are not neutral. For example, they encode assumptions about the relevance of emotional expressions,
introducing an algorithmic biases whereby complex expressions of distress (e.g., I cried) may be
pathologized or misinterpreted. While we incorporated such tools into our pipeline, we recognize their potential
as well as their limited generalizability: apparent predictive accuracy may stem from superficial lexical
cues, ultimately compromising model robustness. In our case, the sentiment classification pipeline,
although treated with caution, proved particularly unreliable and may have introduced noise. Moreover,
such biases raise ethical concerns about whose emotional registers are recognized and whose are
overlooked [61, 62].</p>
      <p>Building on these concerns, we propose a broader reframing of emotional language in mental health
research: one that recognizes that linguistic expression is shaped by multiple contextual factors beyond
mental health status, such as genre, interactional context, and communicative intent. Future research
should incorporate these dimensions in both model design and evaluation.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>We would like to thank the organizers of CLEF eRisk 2025 for this interesting task. We found it to be
both technically and conceptually rewarding as it challenged us to think beyond raw performance.</p>
      <p>This work was supported by the German Federal Ministry of Education and Research (BIFOLD25B),
and conducted within the framework of the CAIMed – Center for Artificial Intelligence in Medicine,
which supports interdisciplinary research at the intersection of AI, ethics, and medicine.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>The authors used Claude 3.7 Sonnet as a programming aid for the sentiment classification in the
corpuslinguistic component. The analysis and interpretation of the results were carried out independently by
the authors.</p>
    </sec>
    <sec id="sec-8">
      <title>CrediT Authorship Contribution Statement</title>
      <p>Elif Kara: Linguistic Analysis (Conceptualization, Methodology, Formal Analysis, Investigation, and
Data Curation); Writing – Original Draft (Related Work, Linguistic Analysis); Writing – Review &amp;
Editing; Supervision Rosa Esther Martín Peña: Ethical Analysis (Conceptualization, Methodology,
Formal Analysis and Investigation); Writing – Original Draft (Discussion and Conclusion) Lisa Raithel:
Machine Learning Pipeline (Conceptualization, Methodology, Software, Validation, Formal Analysis,
Investigation, Data Curation, and Visualization); Writing – Original Draft (Introduction, Related Work,
Pipeline, Discussion and Conclusion)
[8] Z. Xu, F. Huang, M. Kösters, T. Staiger, T. Becker, G. Thornicroft, N. Rüsch, Efectiveness of
interventions to promote help-seeking for mental health problems: Systematic review and
metaanalysis, Psychological Medicine 48 (2018) 2658–2667. doi:10.1017/s0033291718001265.
[9] M. D. Choudhury, E. Kıcıman, Characterizing and predicting mental health disclosures on social
media, in: Proceedings of the 8th International AAAI Conference on Weblogs and Social Media
(icwsm), AAAI Press, 2014, pp. 71–80. doi:10.1609/icwsm.v8i1.14526.
[10] T. D. Afi, E. D. Basinger, J. A. Kam, The extended theoretical model of communal coping:
Understanding the properties and functionality of communal coping., Journal of Communication
70 (2020) 424–446. doi:10.1093/joc/jqaa006.
[11] T. Zhang, A. M. Schoene, S. Ji, S. Ananiadou, Natural language processing applied to
mental illness detection: A narrative review, npj Digital Medicine 5 (2022). doi:10.1038/
s41746-022-00589-7.
[12] J. Parapar, A. Perez, X. Wang, F. Crestani, Overview of eRisk 2025: Early Risk Prediction on
the Internet, in: Experimental IR Meets Multilinguality, Multimodality, and Interaction - 16th
International Conference of the CLEF Association, CLEF 2025, Madrid, Spain, September 9-12,
2025, Proceedings, Part II, volume To be published of Lecture Notes in Computer Science, Springer,
2025.
[13] J. Parapar, A. Perez, X. Wang, F. Crestani, Overview of eRisk 2025: Early Risk Prediction on the
Internet (Extended Overview), in: Working Notes of the Conference and Labs of the Evaluation
Forum (CLEF 2025), Madrid, Spain, 9-12 September, 2025, volume To be published of CEUR
Workshop Proceedings, CEUR-WS.org, 2025.
[14] N. Fujikawa, Q. T. Nguyen, K. Ito, S. Wakamiya, E. Aramaki, Loneliness episodes: A Japanese
dataset for loneliness detection and analysis, in: O. De Clercq, V. Barriere, J. Barnes, R. Klinger,
J. Sedoc, S. Tafreshi (Eds.), Proceedings of the 14th Workshop on Computational Approaches
to Subjectivity, Sentiment, &amp; Social Media Analysis, Association for Computational Linguistics,
Bangkok, Thailand, 2024, pp. 280–293. doi:10.18653/v1/2024.wassa-1.23.
[15] T. Tseriotou, J. Chim, A. Klein, A. Shamir, G. Dvir, I. Ali, C. Kennedy, G. Singh Kohli, A. Hills,
A. Zirikly, D. Atzil-Slonim, M. Liakata, Overview of the CLPsych 2025 shared task: Capturing
mental health dynamics from social media timelines, in: A. Zirikly, A. Yates, B. Desmet, M. Ireland,
S. Bedrick, S. MacAvaney, K. Bar, Y. Ophir (Eds.), Proceedings of the 10th Workshop on
Computational Linguistics and Clinical Psychology (CLPsych 2025), Association for Computational
Linguistics, Albuquerque, New Mexico, 2025, pp. 193–217. URL: https://aclanthology.org/2025.
clpsych-1.16/.
[16] M. Mendula, S. Gabrielli, F. Finazzi, C. Dompe, M. Delucis, Unveiling mental health insights: A
novel NLP tool for stress detection through writing and speaking analysis to prevent burnout,
AHFE International 122 (2024) 164–174.
[17] A. Fine, P. Crutchley, J. Blase, J. Carroll, G. Coppersmith, Assessing population-level symptoms
of anxiety, depression, and suicide risk in real time using NLP applied to social media data, in:
D. Bamman, D. Hovy, D. Jurgens, B. O’Connor, S. Volkova (Eds.), Proceedings of the Fourth
Workshop on Natural Language Processing and Computational Social Science, Association for
Computational Linguistics, 2020, pp. 50–54. doi:10.18653/v1/2020.nlpcss-1.6.
[18] D. Zarate, M. Ball, M. Prokofieva, V. Kostakos, V. Stavropoulos, Identifying self-disclosed anxiety
on Twitter: A Natural Language Processing approach, Psychiatry Research 330 (2023) 115579.
doi:10.1016/j.psychres.2023.115579.
[19] S. Just, E. Haegert, N. Kořánová, A.-L. Bröcker, I. Nenchev, J. Funcke, C. Montag, M. Stede,
Coherence models in schizophrenia, in: K. Niederhofer, K. Hollingshead, P. Resnik, R. Resnik,
K. Loveys (Eds.), Proceedings of the Sixth Workshop on Computational Linguistics and Clinical
Psychology, Association for Computational Linguistics, Minneapolis, Minnesota, 2019, pp. 126–136.
doi:10.18653/v1/W19-3015.
[20] I. Nenchev, T. Schefler, M. de la Fuente, H. Stuke, B. Wilck, S. A. Just, C. Montag, Linguistic markers
of schizophrenia: a case study of Robert Walser, in: A. Yates, B. Desmet, E. Prud’hommeaux,
A. Zirikly, S. Bedrick, S. MacAvaney, K. Bar, M. Ireland, Y. Ophir (Eds.), Proceedings of the 9th
Workshop on Computational Linguistics and Clinical Psychology (CLPsych 2024), Association for
Computational Linguistics, St. Julians, Malta, 2024, pp. 41–60. URL: https://aclanthology.org/2024.
clpsych-1.4.
[21] N. A. Abdelkadir, C. Zhang, N. Mayo, S. Chancellor, Diverse perspectives, divergent models:
Cross-cultural evaluation of depression detection on Twitter, in: K. Duh, H. Gomez, S. Bethard
(Eds.), Proceedings of the 2024 Conference of the North American Chapter of the Association for
Computational Linguistics: Human Language Technologies (volume 2: Short Papers), Association
for Computational Linguistics, Mexico City, Mexico, 2024, pp. 672–680. doi:10.18653/v1/2024.
naacl-short.58.
[22] A.-M. Bucur, A. Moldovan, K. Parvatikar, M. Zampieri, A. Khudabukhsh, L. Dinu, Datasets
for depression modeling in social media: An overview, in: A. Zirikly, A. Yates, B. Desmet,
M. Ireland, S. Bedrick, S. MacAvaney, K. Bar, Y. Ophir (Eds.), Proceedings of the 10th Workshop on
Computational Linguistics and Clinical Psychology (CLPsych 2025), Association for Computational
Linguistics, Albuquerque, New Mexico, 2025, pp. 116–126. URL: https://aclanthology.org/2025.
clpsych-1.10/.
[23] G. Coppersmith, M. Dredze, C. Harman, K. Hollingshead, M. Mitchell, CLPsych 2015 shared
task: Depression and PTSD on Twitter, in: Proceedings of the 2nd Workshop on Computational
Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality, Association for
Computational Linguistics, Denver, Colorado, 2015, pp. 31–39. doi:10.3115/v1/W15-1204.
[24] D. E. Losada, F. Crestani, J. Parapar, Overview of eRisk: Early risk prediction on the internet,
in: P. Bellot, C. Trabelsi, J. Mothe, F. Murtagh, J. Y. Nie, L. Soulier, E. SanJuan, L. Cappellato,
N. Ferro (Eds.), Experimental Ir Meets Multilinguality, Multimodality, and Interaction, Springer
International Publishing, 2018, pp. 343–361. doi:10.1007/978-3-319-98932-7_30.
[25] D. Maupomé, M. D. Armstrong, F. Rancourt, T. Soulas, M.-J. Meurs, Early detection of signs of
pathological gambling, self-harm and depression through topic extraction and neural networks.,
in: CLEF (Working Notes), 2021, pp. 1031–1045.
[26] S. Zanwar, D. Wiechmann, Y. Qiao, E. Kerz, SMHD-GER: A large-scale benchmark dataset for
automatic mental health detection from social media in German, in: A. Vlachos, I. Augenstein
(Eds.), Findings of the Association for Computational Linguistics: EACL 2023, Association for
Computational Linguistics, Dubrovnik, Croatia, 2023, pp. 1526–1541. doi:10.18653/v1/2023.
findings-eacl.113.
[27] E. Campillo-Ageitos, J. Martinez-Romo, L. Araujo, UNED-MED at eRisk 2022: depression detection
with TF-IDF, linguistic features and embeddings, in: Proceedings of the Working Notes of CLEF
2022, 2022, pp. 864–874.
[28] A. Husseini Orabi, P. Buddhitha, M. Husseini Orabi, D. Inkpen, Deep learning for depression
detection of Twitter users, in: Proceedings of the Fifth Workshop on Computational Linguistics
and Clinical Psychology: From Keyboard to Clinic, Association for Computational Linguistics,
2018, pp. 88–97. doi:10.18653/v1/w18-0609.
[29] M. de Choudhury, S. Counts, E. Horvitz, Social media as a measurement tool of depression in
populations, in: Proceedings of the 5th Annual Acm Web Science Conference, WebSci ’13, ACM,
New York, NY, 2013, pp. 47–56. doi:10.1145/2464464.2464480.
[30] J. W. Pennebaker, The secret life of pronouns, New Scientist 211 (2011) 42–45. doi:10.1016/
s0262-4079(11)62167-2.
[31] D. Smirnova, P. Cumming, E. Sloeva, N. Kuvshinova, D. Romanov, G. Nosachev, Language patterns
discriminate mild depression from normal sadness and euthymic state, Frontiers in Psychiatry 9
(2018) 1–11. doi:10.3389/fpsyt.2018.00105.
[32] J. L. Baddeley, J. W. Pennebaker, C. G. Beevers, Everyday social behavior during a Major
Depressive Episode, Social Psychological and Personality Science 4 (2013) 445–452. doi:10.1177/
1948550612461654.
[33] G. Gkotsis, A. Oellrich, T. Hubbard, R. Dobson, M. Liakata, S. Velupillai, R. Dutta, The language of
mental health problems in social media, in: Proceedings of the Third Workshop on Computational
Linguistics and Clinical Psychology, 2016, pp. 63–73. doi:10.18653/v1/W16-0307.
[34] N. Ramirez-Esparza, C. K. Chung, E. Kacewicz, J. W. Pennebaker, The psychology of word use in
depression forums in English and in Spanish: Texting two text analytic approaches, in: ICWSM
2008 - Proceedings of the 2nd International Conference on Weblogs and Social Media, 2008, pp.
102–108. doi:10.1609/icwsm.v2i1.18623.
[35] M. Al-Mosaiwi, T. Johnstone, In an absolute state: Elevated use of absolutist words is a marker
specific to anxiety, depression, and suicidal ideation, Clinical Psychological Science 6 (2018)
529–542. doi:10.1177/2167702617747074, pMID: 30886766.
[36] A. Arntz, L. D. Hawke, L. Bamelis, P. Spinhoven, M. L. Molendijk, Changes in natural language
use as an indicator of psychotherapeutic change in personality disorders, Behaviour Research and
Therapy 50 (2012) 191–202. doi:10.1016/j.brat.2011.12.007.
[37] T. Edwards, N. S. Holtzman, A meta-analysis of correlations between depression and first person
singular pronoun use, Journal of Research in Personality 68 (2017) 63–68. doi:10.1016/j.jrp.
2017.02.005.
[38] D. Davis, T. C. Brock, Use of first person pronouns as a function of increased objective
selfawareness and performance feedback, Journal of Experimental Social Psychology 11 (1975)
381–388.
[39] J. Zimmermann, T. Brockmeyer, M. Hunn, H. Schauenburg, M. Wolf, First-person pronoun use
in spoken language as a predictor of future depressive symptoms: Preliminary evidence from a
clinical sample of depressed patients, Clinical Psychology &amp; Psychotherapy 24 (2016) 384–391.
doi:10.1002/cpp.2006.
[40] A. Leis, F. Ronzano, M. A. Mayer, L. I. Furlong, F. Sanz, Detecting signs of depression in tweets in
Spanish: Behavioral and linguistic analysis, Journal of Medical Internet Research 21 (2019) e14199.
doi:10.2196/14199.
[41] A. T. Beck, Depression: Clinical, Experimental, and Theoretical Aspects, Harper &amp; Row, New York,</p>
      <p>NY; Evanston, IL; London, UK, 1967.
[42] D. E. Losada, F. Crestani, A test collection for research on depression and language use, in: N. Fuhr,
P. Quaresma, T. Gonçalves, B. Larsen, K. Balog, C. Macdonald, L. Cappellato, N. Ferro (Eds.),
Experimental IR Meets Multilinguality, Multimodality, and Interaction, volume 9822, Springer
International Publishing, Cham, 2016, pp. 28–39. doi:10.1007/978-3-319-44564-9_3.
[43] S. T. Gries, Dispersions and adjusted frequencies in corpora, International Journal of Corpus</p>
      <p>Linguistics 13 (2008) 403–437. doi:10.1075/ijcl.13.4.02gri.
[44] S. Flach, Collostructions: An R implementation for the family of collostructional methods. R
package version 0.2.0, 2021. URL: https://sfla.ch/collostructions/.
[45] A. Stefanowitsch, S. T. Gries, Collostructions: Investigating the interaction of words and
constructions, International Journal of Corpus Linguistics 8 (2003) 209–243. doi:10.1075/ijcl.8.
2.03ste.
[46] S. T. Gries, A. Stefanowitsch, Extending collostructional analysis: A corpus-based perspective on
alternations, International Journal of Corpus Linguistics 9 (2004) 97–129. doi:10.1075/ijcl.9.
1.06gri.
[47] A. Stefanowitsch, Collostructional analysis, in: T. Hofmann, G. Trousdale (Eds.), The Oxford
Handbook of Construction Grammar, Oxford University Press, Oxford, UK, 2013, pp. 290–306.
doi:10.1093/oxfordhb/9780195396683.013.0016.
[48] C. J. Fillmore, Syntactic intrusions and the notion of grammatical construction, in: Annual Meeting
of the Berkeley Linguistics Society, volume 11, Linguistic Society of America, Berkeley, 1985, pp.
73–86. doi:10.3765/bls.v11i0.1913.
[49] A. E. Goldberg, Constructions: A Construction Grammar Approach to Argument Structure,</p>
      <p>University of Chicago Press, Chicago, 1995.
[50] E. Kara, What depression feels like: A collostructional analysis of patient and caregiver perspectives,</p>
      <p>Zeitschrift für Anglistik und Amerikanistik 72 (2024) 249–282. doi:10.1515/zaa-2024-2027.
[51] C. J. Hutto, E. Gilbert, VADER: A parsimonious rule-based model for sentiment analysis of social
media text, Proceedings of the International AAAI Conference on Web and Social Media (2014).
doi:10.1609/icwsm.v8i1.14550.
[52] M. Davies, The Corpus of Contemporary American English: 450 million words, 1990–2012, 2008.</p>
      <p>URL: http://corpus.byu.edu/coca.
[53] M. Scott, PC analysis of key words — and key key words, System 25 (1997) 233–245. doi:10.1016/
s0346-251x(97)00011-0.
[54] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, I. Polosukhin,
Attention is all you need, 31st Conference on Neural Information Processing Systems (NIPS 2017
(2017) 11.
[55] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, BERT: Pre-training of deep bidirectional transformers
for language understanding, 2018. doi:10.48550/ARXIV.1810.04805.
[56] S. Ji, T. Zhang, L. Ansari, J. Fu, P. Tiwari, E. Cambria, MentalBERT: Publicly available pretrained
language models for mental healthcare, in: N. Calzolari, F. Béchet, P. Blache, K. Choukri, C. Cieri,
T. Declerck, S. Goggi, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, J. Odijk, S. Piperidis (Eds.),
Proceedings of the Thirteenth Language Resources and Evaluation Conference, European Language
Resources Association, Marseille, France, 2022, pp. 7184–7190. URL: https://aclanthology.org/2022.
lrec-1.778/.
[57] D. Nutt, S. Wilson, L. Paterson, Sleep disorders as core symptoms of depression, Dialogues in</p>
      <p>Clinical Neuroscience 10 (2008) 329–336. doi:10.31887/DCNS.2008.10.3/dnutt.
[58] S. Yasugaki, H. Okamura, A. Kaneko, Y. Hayashi, Bidirectional relationship between sleep and
depression, Neuroscience Research 211 (2025) 57–64. doi:10.1016/j.neures.2023.04.006.
[59] T. Zhang, K. Yang, S. Ji, S. Ananiadou, Emotion fusion for mental illness detection from social
media: A survey, Information Fusion 92 (2023) 231–246. doi:10.1016/j.inffus.2022.11.031.
[60] N. V. Babu, E. G. M. Kanaga, Sentiment analysis in social media data for depression
detection using artificial intelligence: A review, Sn Computer Science 3 (2022) 74. doi: 10.1007/
s42979-021-00958-1.
[61] A. Benton, M. Mitchell, D. Hovy, Multitask learning for mental health using social media text, in:</p>
      <p>Proceedings of EACL 2017, 2017, pp. 152–162.
[62] C. Burr, Digital psychiatry: Risks, opportunities and recommendations, Frontiers in Digital Health
4 (2022). doi:10.1109/TTS.2020.2977059.
[63] T. R. C. Read, N. A. C. Cressie, Goodness-Of-Fit Statistics for Discrete Multivariate Data, Springer,</p>
      <p>New York, 1988. doi:10.1007/978-1-4612-4578-0.
[64] S. Evert, B. Krenn, Methods for the qualitative evaluation of lexical association measures, in:
Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics,
Association for Computational Linguistics, Toulouse, France, 2001, pp. 188–195. doi:10.3115/1073012.
1073037.
[65] A. Stefanowitsch, S. Flach, Too big to fail but big enough to pay for their mistakes: A Collostructional
Analysis of the Patterns [too Adj to V] and [Adj enough to V], John Benjamins Publishing Company,
Amsterdam, 2020, pp. 247–272. doi:10.1075/ivitra.24.13ste.</p>
    </sec>
    <sec id="sec-9">
      <title>A. Pilot Study 1: DCA Results</title>
      <p>This appendix (Table 3) displays the distinctive verb associations with the FPSP in POS vs NEG, the
observed and expected frequencies, as well as the strength of association. The association measure is the
G statistic from the log-likelihood ratio test, which compares observed co-occurrence frequencies against
expected frequencies under the assumption of independence. This test is well suited for analysing
linguistic data with uneven frequency distributions, which applies to the present datasets [63, 64].</p>
      <p>Table 4 displays the collexemes overlapping in SCAs of the two cohorts, as a baseline to the contrastive</p>
    </sec>
    <sec id="sec-10">
      <title>B. Pilot Study 1: DCA and SCA Research Designs</title>
      <p>This appendix (Table 5) illustrates the research designs of DCA and SCA. DCA assesses the association
strength between a lexical item l and one construction c1 over another related construction c2. In
contrast, SCA assesses the association strength of a lexical item l with a construction c, relative to all
other lexical items occurring in c and outside of it (!c).</p>
    </sec>
    <sec id="sec-11">
      <title>C. Pilot Study 2: KA Results</title>
      <p>This appendix (Table 6) provides the highest-ranking keywords associated with the datasets. As before,
the association metric is the G value of the log-likelihood ratio test.
i, my, nt, me, you, it, m, just, do, like, lol, so, feel, ve, really, if, am, depression, get, shit, but, your, haha,
re, have, etc, reddit, edit, myself, s, someone, thanks, fuck, pretty, meds, fucking, love, definitely, ca, op,
honestly, because, awesome, anxiety, try, people, idk, self, can, want
reddit, binance, account, i, your, you, nt, cryptocurrency, exchanging, email, wallet, crypto, fuck, gt,
my, discount, m, implies, it, referral, lol, register, amp, just, r, awesome, check, will, select, trade, if,
approached, send, code, is, click, post, substantial, thanks, connection
the, of, in, his, he, says, by, said, president, percent, national, were, united, we, states, american, york,
their, her, students, washington, million, political, new, government, among, state, john, bush, three,
economic, from, toward, war, clinton, university, center
This appendix provides the rule-based framework used to make final predictions, as introduced in the
Pipeline section.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Handy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Mangal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. S.</given-names>
            <surname>Stead</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. L.</given-names>
            <surname>Cofee</surname>
          </string-name>
          , L. Ganti,
          <article-title>Prevalence and impact of diagnosed and undiagnosed depression in the United States</article-title>
          ,
          <string-name>
            <surname>Cureus</surname>
          </string-name>
          (
          <year>2022</year>
          ). doi:
          <volume>10</volume>
          .7759/cureus.28011.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>R. M.</given-names>
            <surname>Epstein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. R.</given-names>
            <surname>Duberstein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. D.</given-names>
            <surname>Feldman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. B.</given-names>
            <surname>Rochlen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. A.</given-names>
            <surname>Bell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. L.</given-names>
            <surname>Kravitz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Cipri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. D.</given-names>
            <surname>Becker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. M.</given-names>
            <surname>Bamonti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. A.</given-names>
            <surname>Paterniti</surname>
          </string-name>
          , “
          <article-title>I didn't know what was wrong:” How people with undiagnosed depression recognize, name and explain their distress</article-title>
          ,
          <source>Journal of General Internal Medicine</source>
          <volume>25</volume>
          (
          <year>2010</year>
          )
          <fpage>954</fpage>
          -
          <lpage>961</lpage>
          . doi:
          <volume>10</volume>
          .1007/s11606-010-1367-0.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>C.</given-names>
            <surname>Kraus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Kadriu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Lanzenberger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. A.</given-names>
            <surname>Zarate</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kasper</surname>
          </string-name>
          ,
          <article-title>Prognosis and improved outcomes in Major Depression: A review, Translational Psychiatry 9 (</article-title>
          <year>2019</year>
          ).
          <source>doi:10.1038/ s41398-019-0460-3.</source>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>C.</given-names>
            <surname>Buntrock</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Harrer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Sprenger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Illing</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sakata</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. A.</given-names>
            <surname>Furukawa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. D.</given-names>
            <surname>Ebert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Cuijpers</surname>
          </string-name>
          ,
          <article-title>Psychological interventions to prevent the onset of Major Depression in adults: A systematic review and individual participant data meta-analysis</article-title>
          ,
          <source>The Lancet Psychiatry</source>
          <volume>11</volume>
          (
          <year>2024</year>
          )
          <fpage>990</fpage>
          -
          <lpage>1001</lpage>
          . doi:
          <volume>10</volume>
          .1016/s2215-
          <volume>0366</volume>
          (
          <issue>24</issue>
          )
          <fpage>00316</fpage>
          -
          <lpage>x</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Boerema</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. Ten</given-names>
            <surname>Have</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kleiboer</surname>
          </string-name>
          , R. de Graaf, J. Nuyen,
          <string-name>
            <given-names>P.</given-names>
            <surname>Cuijpers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. T. F.</given-names>
            <surname>Beekman</surname>
          </string-name>
          ,
          <article-title>Demographic and need factors of early, delayed and no mental health care use in Major Depression: A prospective study</article-title>
          ,
          <source>Bmc Psychiatry</source>
          <volume>17</volume>
          (
          <year>2017</year>
          )
          <article-title>367</article-title>
          . doi:
          <volume>10</volume>
          .1186/s12888-017-1531-8.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Tomczyk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Schmidt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Muehlan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Stolzenburg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Schomerus</surname>
          </string-name>
          ,
          <article-title>A prospective study on structural and attitudinal barriers to professional help-seeking for currently untreated mental health problems in the community</article-title>
          ,
          <source>The Journal of Behavioral Health Services I&amp; Research</source>
          <volume>47</volume>
          (
          <year>2019</year>
          )
          <fpage>54</fpage>
          -
          <lpage>69</lpage>
          . doi:
          <volume>10</volume>
          .1007/s11414-019-09662-8.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Clement</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Schauman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Graham</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Maggioni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Evans-Lacko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Bezborodovs</surname>
          </string-name>
          , C. Morgan,
          <string-name>
            <given-names>N.</given-names>
            <surname>Rüsch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. S. L.</given-names>
            <surname>Brown</surname>
          </string-name>
          , G. Thornicroft,
          <article-title>What is the impact of mental health-related stigma on help-seeking? A systematic review of quantitative and qualitative studies</article-title>
          ,
          <source>Psychological Medicine</source>
          <volume>45</volume>
          (
          <year>2014</year>
          )
          <fpage>11</fpage>
          -
          <lpage>27</lpage>
          . doi:
          <volume>10</volume>
          .1017/s0033291714000129.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          1. If ¯, the top-
          <article-title>-averaged probability of the model, is below the lower-bound threshold</article-title>
          =
          <article-title>0.4, the model seems not confident at all and we decide that the decision is 0 (no diagnosis; we do not check any additional features).</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          2. If ¯
          <article-title>is higher than a pre-defined threshold (  = 0.8) we deem the model confident enough and decide for 1 (diagnosis; again, we do not check additional features).</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          3. If  ≤ ¯ ≤
          <article-title>: a) If the user has  ≤ 5 messages or less than two rounds of conversation are available, we decide 0 due to insuficient data. b) If the user has  &gt; 5 messages but we processed fewer than 2 rounds of conversation for this user, we decide 0 due to insuficient data. c) If the user has  &gt; 5 messages and there are more than 2 rounds of conversation available, we check the additional linguistic and metadata features. If relevant thresholds are exceeded (e</article-title>
          .g.,
          <article-title>frequent night-time activity, presence of diagnostic verbs), we decide 1, otherwise 0</article-title>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>