<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Pretrained Language Model for Mental Health Risk Detection</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Diego Maupomé</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fanny Rancourt</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Raouf Belbahar</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marie - Jean Meurs</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Université du Québec à Montréal</institution>
          ,
          <addr-line>Montréal, QC</addr-line>
          ,
          <country country="CA">Canada</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Early detection of mental health issues is a key contributor to eficient treatment. Natural language processing-based approaches can provide automated means to facilitate access to appropriate services and support for at-risk individuals. Using pretrained language models provides state-of-the-art results in various downstream tasks as these models leverage significant amounts of textual content. They can be critical in data-scarce research areas, such as early detection of mental health issues. Nonetheless, exposing models to domain-specific language can be beneficial to their performance in downstream task. To this end, we release pretrained language models, MentalHealthBERT, leveraging content from Reddit fora discussing anorexia, depression and self-harm. These models are evaluated on risk detection tasks for the respective conditions.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        1. Introduction
that large models trained on suficiently large data sets
will learn to produce useful representations of text
reEarly intervention in mental health and well-being has gardless of what specialized task these representations
become a critical principle of mental health care, ush- will serve. Such a framework leverages large quantities
ering in an international wave of service reform [
        <xref ref-type="bibr" rid="ref10 ref11">1, 2</xref>
        ]. of data for models to learn aspects of language that are
Given the ever-growing use and diversity of online social thought to precede the specifics of the specialized task.
media, there has been a vast increase in research interest While this assumption may hold for tasks, pretraining
for the use of Natural Language Processing (NLP) for data can also issue from diferent sources than the
specialthe development of automated means of analyzing on- ized data. As such, representations produced by
generalline textual content in the service of mental health care purpose models might be inadequate. Recent work has
support and early intervention in particular [3, 4]. pointed to the benefits of domain specificity in large
pre
      </p>
      <p>The inference of such predictive models requires the trained models. Broadly, the term domain refers to the
gathering of annotated data. These data map online tex- topics, mode or register of documents. Domain
specitual content to an assessment of certain aspects of the ifcity concerns can take the form of models pretrained
mental health of the authors of this content. Such as- entirely on domain-specific data or domain adaptation.In
sessments are dificult to produce. Whereas for other either case, gains in downstream task performance have
common tasks in NLP, annotation can operate on the ob- been reported for several tasks and domains from the use
servation itself (e.g. the text), annotation relating to men- of such domain-specific pretraining [11].
tal health generally requires further information about The textual data analyzed for mental health care
purthe author of the textual content. That is, the true aspects poses issues from Internet fora and social media. These
of interest pertain to the author rather than the text. In data can difer both in register and topics from news
particular, clinically grounded assessments require ac- or encyclopedia articles comprising significant parts of
cess to the individual. As such, gathering annotated data large corpora. Nonetheless, there is no established
linis expensive and time-consuming. guistic consensus on what constitutes a domain [see 12,</p>
      <p>In the absence of large quantities of annotated data, it is Sec. 3.4.1]. Given this dificulty in defining the notion
a well-established principle of machine learning that pre- of domain, it is dificult to delineate given domains or to
training on an unsupervised task can help performance establish quantitative diferences between them.
on a downstream supervised task. As such, there has been Pragmatically, one might ask whether a more narrow
increased interest in the production of pretrained models concept of a given domain may provide more benefit to
leveraging large amounts of textual content [5, 6, 7, 8]. downstream task performance than a broader one. The
Such models are made available for use on a variety of present work seeks to study this issue in the context of
specialized downstream tasks [9, 10]. The core tenet is mental health risk assessment. Models are pretrained on
data from Internet fora revolving around three diferent
mental health concerns: anorexia, depression and
selfharm. The models are evaluated on detection tasks
surrounding these concerns and compared to models trained
on broader data [8]. Our results corroborate the
benemodels but only show advantages to pretraining-data system does not aim to diagnose mental health disorders
specificity in one case: anorexia. and should not be used to do so.</p>
      <p>
        However, the misuse of this kind of work can have
negative societal impacts. For example, an organization could
2. Data use our pretrained language models to detect at-risk job
applicants of mental health disorders before hiring. This
2.1. Retrieval practice, violating the terms of the release agreement,
The data were extracted from three Reddit1 fora would further spur discrimination in hiring processes in
(known as subreddits): depression, selfharm and addition to well-documented gender and racial
unfairAnorexiaNervosa. This extraction was performed us- ness [
        <xref ref-type="bibr" rid="ref12 ref13">14, 15</xref>
        ].
ing Pushshift2 [13]. For all three subreddits, it was limited While this line of research could potentially advance
to posts published from the 1st of January 2019 to the early intervention and treatment processes, it does not
25th of November 2020.3 Further, posts struck of as “re- directly address the stigma surrounding mental health
moved” were discarded. The fields associated with each issues and underlying the high rate of treatment
avoidpost include the title and body of the post, as well as ance and discontinuation [
        <xref ref-type="bibr" rid="ref14 ref15">16, 17</xref>
        ]. Further, widespread
the timestamp, the score (aggregation of up- and down- study and deployment of models in this direction could
votes), the number of replies and the identifier of the potentially lead to self-censorship, defeating its purpose.
parent post. No additional filtering was applied. It is also important to note that demographic data on
the authors is missing. As noted by Shatz [
        <xref ref-type="bibr" rid="ref16">18</xref>
        ], most
2.2. Description subreddits do not have data regarding their community
demographics. Hence, it is impossible to ensure that the
All subreddits considered are described as communities textual productions used to train the released model
adethat ofer a safe place and peer support for people afected quately represent content from diverse individuals. To
by the aforementioned issues4. Summary statistics for the best of our knowledge, there is no readily available
the corpus are presented in Table 1. dataset containing information regarding the author’s
      </p>
      <p>
        The depression forum is by far the biggest com- age, gender, ethnicity, or location. From inferred
demomunity of the three with more than 736,000 members graphics, Amir et al. [
        <xref ref-type="bibr" rid="ref17">19</xref>
        ] presented that those sensitive
as of March 2nd 2021. Of those, about 45% authored at attributes afect depression prevalence across social
meleast one publication (i.e. a post or a comment) in the se- dia users. Aguirre et al. [
        <xref ref-type="bibr" rid="ref18">20</xref>
        ] observed performance gaps
lected time frame. A similar proportion can be observed related to gender and racial attributes. To address this
from AnorexiaNervosa. In turn, it jumps to almost gap, a data collection combining strict privacy policies
two-thirds for selfharm. Across all three subreddits, and clinical supervision must be achieved. As noted by
approximately 40% of the authors published exactly once. Aguirre et al. [
        <xref ref-type="bibr" rid="ref18">20</xref>
        ], storing such sensitive data comes with
Despite having the fewest overall publications, threads serious potential harms. Therefore, it is critical to enforce
on AnorexiaNervosa seem to generate the most en- protective measures such as data anonymization.
gagement, having a higher ratio of comments per thread
and remaining active for longer periods. The smaller 4. Pretraining
size of this community is a likely explanation for these
observations.
      </p>
    </sec>
    <sec id="sec-2">
      <title>4.1. Preprocessing</title>
      <p>
        3. Ethical Considerations One key issue in modeling corpora from Internet fora
rather than an edited outlet, such as a newspaper or
All posts collected in the aforementioned subreddits are encyclopedia, is the longer vocabulary tail caused by
public but our collection will not be publicly available. misspellings, neologisms and even usernames. Common
Further, resources discussed in this work will be released practice would be to remove words having fewer than
upon the signature of a User Agreement. The released three occurrences [
        <xref ref-type="bibr" rid="ref19">21</xref>
        ]. Keeping such words would
inmodel should only be used in combination with other crease the computational burden of the model while
havscreening tools for prevention purposes under the super- ing little chance of learning because of the limited
numvision of trained mental health professionals. Hence, this ber of occurrences. However, this is not suitable for our
purposes: Important words might be misspelled or
obfus1https://www.reddit.com cated, but their exclusion will hinder performance [
        <xref ref-type="bibr" rid="ref20">22</xref>
        ].
32hTthtepsla:/t/epsutsphosshtifftr.oiom/ AnorexiaNervosa was published on the 3rd Similarly, usernames and neologisms might be composed
of December 2020. from familiar, significant words. As such, we preserve the
4As per their respective "About Community" section of each subred- entire vocabulary of each dataset, relying on
subworddit
AnorexiaNervosa
depression
selfharm
      </p>
      <sec id="sec-2-1">
        <title>Tokens</title>
      </sec>
      <sec id="sec-2-2">
        <title>Vocabulary</title>
      </sec>
      <sec id="sec-2-3">
        <title>Posts</title>
        <p>average number of tokens</p>
      </sec>
      <sec id="sec-2-4">
        <title>Comments average number of tokens</title>
      </sec>
      <sec id="sec-2-5">
        <title>Unique author</title>
        <p>Community size*
level tokenization to capture these variations.</p>
        <p>Before learning this tokenization, the data was split
into training and validation sets by stratifying across
length (word count) percentiles. This preserves the key
length statistics, such as the median and interquartile
range. In terms of vocabulary, words in the validation
set not present in the training set make up 0.50%, 0.20%
and 0.10% of occurrences in the anorexia, self-harm and
depression sets, respectively.</p>
        <p>
          The data was tokenized by Byte-Pair Encodings (BPEs)
[
          <xref ref-type="bibr" rid="ref21">23</xref>
          ] at the byte level [6], with the merges extracted from
all three datasets. This consolidation was done to provide
a more robust tokenization scheme, less skewed towards
any particular forum, while still learning the words and
spellings of online parlance. For comparison, each dataset
was also tokenized using merges learned exclusively from
itself.
tory level, which spans a variety of presumably
independent writings. In order to make this subject-level
prediction, information gathered across a set of writings needs
to be aggregated. To achieve this, token embeddings are
averaged together within posts and subsequently fed to
a feed-forward network with a single hidden layer and
hyperbolic tangent activation. The resulting document
vectors are then aggregated by averaging into a single
vector encoding a history of writings. This vector is then
mapped to a binary prediction for the sequence of
writings by a feed-forward network with a single hidden
layer with hyperbolic tangent activation.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>5.1. Experiments</title>
      <p>We evaluate the MentalHealthBERT models on the eRisk
datasets [26, 27, 28]. These datasets comprise Reddit
users (subjects) labeled as being at risk (positive) or not
(negative) for depression, self-harm or anorexia,
respectively. For each subject, a history of their writings is
included, spanning a variety of subreddits. The proportion
of positive subjects is fairly small and varies somewhat,
as does the size of the datasets, as shown in Table 2.</p>
      <p>The key issue is utilizing the document-level encoding
aforded by MentalHealthBERT in predictions at the
his</p>
      <sec id="sec-3-1">
        <title>The experiments compare the performance of Mental</title>
        <p>
          HealthBERT to the generic RoBERTa Transformer as well
as the latter further pretrained on our data (domain
adap4.2. Training tation). For MentalHealthBERT, experiments were
carried out using BPEs learned from the combined dataset as
Once tokenized, these datasets were used to train Trans- well as from the individual collections. Additionally, we
formers [
          <xref ref-type="bibr" rid="ref22">24</xref>
          ] using the RoBERTa approach [6]. Models run experiments using MentalRoBERTa [8]. This model
are trained by the Adam optimizer [
          <xref ref-type="bibr" rid="ref23">25</xref>
          ] with a learning was pretrained on Reddit data from several diferent fora
rate of 5E-4 on batches of 256 sequences of a maximum touching on mental health topics5. It should be noted that
length of 256 tokens. Training takes place over a max- the results reported by the authors on the eRisk
depresimum of 300 epochs, applying early stopping based on sion detection task are not comparable to those reported
validation set perplexity. here, as they make use of a custom data split with some
resampling [29]. Models are evaluated per the area under
the precision-recall curve.
5. Mental Health Risk Detection One dificulty of detecting potential threats to mental
health is the small proportion of positive subjects that can
be found in datasets and, indeed, in a real-world setting.
        </p>
        <p>Additionally, for the selected datasets, these proportions
vary widely between the training and testing sets, as
shown in Table 2. Models were evaluated using the latest
set of data for each task: 2022 for Depression, 2021 for
Self-Harm and 2019 for Anorexia. Training and validation</p>
      </sec>
      <sec id="sec-3-2">
        <title>5An exhaustive list of the fora from which pretraining</title>
        <p>data were extracted is not available, but they include
depression, SuicideWatch, Anxiety, offmychest, bipolar,
mentalillness, and mentalhealth.</p>
        <sec id="sec-3-2-1">
          <title>Train</title>
        </sec>
        <sec id="sec-3-2-2">
          <title>Test</title>
          <p>dataset
positive
negative
positive
negative</p>
        </sec>
        <sec id="sec-3-2-3">
          <title>Depression</title>
        </sec>
        <sec id="sec-3-2-4">
          <title>Self-Harm</title>
          <p>Anorexia
214
145
61
1493
618
411
1302
1296
742
sets for each task were obtained by combining the data be due to a deficiency in eating disorder content in
prefrom all previous sets and randomly selecting 80% of training MentalRoBERTa, though we cannot confirm this.
subjects for training and 20% for validation, preserving Tokenization seems to be inconsequential, with a more
equal proportions of positive and negative subjects. marked decrease in performance for the combined
tok</p>
          <p>To address this class imbalance in training, a number of enizer in depression. Given the dificulty that specialized
strategies were deployed, including inverse class weight- tokenization puts in transferring learning, it is dificult
ing, class weighting based on efective samples [ 30] and to support in light of these results. Finally, there appears
Focal Loss [31]. These proved to be inefective in val- to be little diference in performance between
domainidation. The most efective mechanism proved to be adapted RoBERTa and the best MentalHealthBERT model,
sampling batches of even proportions of positive and suggesting no real benefit to training blank models over
negative subjects in training. adapting pretrained models. In light of these results, it</p>
          <p>
            The number of writings used to arrive at a prediction is dificult to establish whether the specific domain
prefor a subject was set to  = 50. In order to reduce over- training of MentalHealthBERT helps downstream
perforiftting, a contiguous sample of  writings was taken per mance more so than more the general domain adaptation
subject at training time. In validation and testing, only found in MentalRoBERTa. As mentioned, benefits are
the last  writings were taken. The classifiers are trained only observable for the anorexia task. Given what is
by the Adam optimizer [
            <xref ref-type="bibr" rid="ref23">25</xref>
            ] over 10 epochs. Given the known of the pretraining of MentalRoBERTa, it is
difirelatively modest size of the datasets in terms of posi- cult to establish whether this may be due to any material
tive subjects, only the top two layers of the Transformer characteristics of discourse around anorexia or its
relaencoder were trained, with a learning rate of 1E-5. The tively smaller weight in pretraining.
remainder of the model had a learning rate set to 1E-4.
          </p>
          <p>Results on the eRisk datasets are presented in Table 3.</p>
          <p>Results for the base RoBERTa model indicate improve- 6. Conclusion
ments with domain adaptation, in agreement with the
literature [11]. Perhaps counterintuitively, these improve- There is increased research interest in the development
ments appear to decrease with the amount of domain of NLP approaches to assist in early risk assessment in
adaptation data available. MentalRoBERTa and Mental- mental health care. Gathering annotated data is a costly
HealthBERT achieve comparable results in all but the process, making pretraining a crucial step in the
modelanorexia task, for which MentalRoBERTa and domain- ing process. Thus, pretrained language models can be a
adapted RoBERTa outperform MentalRoBERTa. This may valuable resource. However, general-purpose language</p>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <article-title>models, while trained on large amounts of data</article-title>
          , may not [3]
          <string-name>
            <given-names>H.-C.</given-names>
            <surname>Shing</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Resnik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. W.</given-names>
            <surname>Oard</surname>
          </string-name>
          ,
          <article-title>A prioritization be suited to specific domains, such as mental health dis- model for suicidality risk assessment</article-title>
          , in: Proceedcussions.
          <article-title>As such, there is interest in adapting language ings of the 58th Annual Meeting of the Association models to particular domains</article-title>
          .
          <source>for Computational Linguistics</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>8124</fpage>
          -
          <lpage>8137</lpage>
          .
          <article-title>In the case of mental health risk assessment from text</article-title>
          , [4]
          <string-name>
            <given-names>D.</given-names>
            <surname>Maupomé</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. D.</given-names>
            <surname>Armstrong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Rancourt</surname>
          </string-name>
          , M.
          <article-title>- domain-specific pretraining resources would contain dis- J. Meurs, Leveraging textual similarity to predict course concerning mental health concerns. However, it Beck Depression Inventory answers, Proceedings of is worth considering whether discourse issuing from out- the Canadian Conference on Artificial Intelligence lets specific to a particular mental health concern are (</article-title>
          <year>2021</year>
          ). doi:
          <volume>10</volume>
          .21428/594757db.5c753c3d.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <article-title>more adequate than discourse around mental health is-</article-title>
          [5]
          <string-name>
            <given-names>M. E.</given-names>
            <surname>Peters</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Neumann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Iyyer</surname>
          </string-name>
          , M. Gardner, sues at large.
          <article-title>Our experiments have thus made use of data C. Clark</article-title>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          ,
          <article-title>Deep contextualextracted from fora dedicated to specific mental health ized word representations</article-title>
          , arXiv:
          <year>1802</year>
          .
          <article-title>05365 [cs] concerns to pretrain models</article-title>
          .
          <source>These models are compared</source>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <article-title>to general-purpose language models as well as language [6</article-title>
          ]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Joshi</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          <article-title>Chen, models pretrained on broader mental health content in O</article-title>
          . Levy,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          ,
          <string-name>
            <surname>V.</surname>
          </string-name>
          <article-title>Stoya mental health risk assessment task. Our results indi- anov, Roberta: A robustly optimized BERT cate that domain adaptation does improve classification pretraining approach</article-title>
          , arXiv:
          <year>1907</year>
          .
          <volume>11692</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>performance. However,</surname>
          </string-name>
          <article-title>a diference in performance be</article-title>
          - arXiv:
          <year>1907</year>
          .11692.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <article-title>tween more narrowly pretrained models is only manifest [7</article-title>
          ]
          <string-name>
            <given-names>K.</given-names>
            <surname>Clark</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>T.</given-names>
            <surname>Luong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q. V.</given-names>
            <surname>Le</surname>
          </string-name>
          , C. D.
          <article-title>Manning, in anorexia risk detection</article-title>
          . ELECTRA:
          <article-title>Pre-training text encoders as discrimiFurther work is needed to understand how textual nators rather than generators</article-title>
          , arXiv:
          <year>2003</year>
          .
          <article-title>10555 [cs] data from separate mental health topics interact in terms (</article-title>
          <year>2020</year>
          ). URL: http://arxiv.org/abs/
          <year>2003</year>
          .10555, arXiv:
          <article-title>of benefits from pretraining: more experimentation is</article-title>
          <year>2003</year>
          .
          <volume>10555</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <article-title>needed to find whether the detection of certain mental [8</article-title>
          ]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , L. Ansari,
          <string-name>
            <given-names>J.</given-names>
            <surname>Fu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Tiwari</surname>
          </string-name>
          ,
          <string-name>
            <surname>E.</surname>
          </string-name>
          <article-title>Camhealth concerns is improved by pooling pretraining data bria, MentalBERT: Publicly available pretrained and whether these gains in detection performance align language models for mental healthcare</article-title>
          , in: N.
          <article-title>Calzowith the comorbidity of the underlying disorders</article-title>
          . Were lari,
          <string-name>
            <given-names>F.</given-names>
            <surname>Béchet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Blache</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Choukri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Cieri</surname>
          </string-name>
          ,
          <string-name>
            <surname>T. Dethis</surname>
          </string-name>
          <article-title>the case, those benefits might be explained by the clerck</article-title>
          , S. Goggi,
          <string-name>
            <given-names>H.</given-names>
            <surname>Isahara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Maegaard</surname>
          </string-name>
          , J. Mariani,
          <article-title>mention of related concerns in discussions about a spe- H.</article-title>
          <string-name>
            <surname>Mazo</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Odijk</surname>
          </string-name>
          , S. Piperidis (Eds.),
          <article-title>Proceedings cific mental health concern. While pretraining data for of the Thirteenth Language Resources and Evaluour experiments was extracted from dedicated fora, our ation Conference, European Language Resources experiments do not control for the mention of related Association</article-title>
          , Marseille, France,
          <year>2022</year>
          , pp.
          <fpage>7184</fpage>
          -
          <lpage>7190</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <article-title>disorders or threats to mental health</article-title>
          . URL: https://aclanthology.org/
          <year>2022</year>
          .lrec-
          <volume>1</volume>
          .
          <fpage>778</fpage>
          . [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Michael</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Hill</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Levy</surname>
          </string-name>
          ,
          <string-name>
            <surname>S. R.</surname>
          </string-name>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <article-title>Release of Resources Bowman, GLUE: A multi-task benchmark and analysis platform for natural language understanding, Given the sensitive nature of the resources introduced</article-title>
          , arXiv:
          <year>1804</year>
          .07461 [cs] (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <article-title>the models and associated open-source code will be re</article-title>
          - [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Pruksachatkun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Nangia</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          <article-title>Singh, leased upon signing of a User Agreement providing de- J.</article-title>
          <string-name>
            <surname>Michael</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Hill</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          <string-name>
            <surname>Levy</surname>
            ,
            <given-names>S. R.</given-names>
          </string-name>
          <string-name>
            <surname>Bowman</surname>
          </string-name>
          ,
          <article-title>Supertails on permitted uses. GLUE: A stickier Benchmark for general-purpose language understanding systems</article-title>
          , arXiv:
          <year>1905</year>
          .00537 References [cs] (
          <year>2020</year>
          ). [11]
          <string-name>
            <given-names>S.</given-names>
            <surname>Gururangan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Marasović</surname>
          </string-name>
          , S. Swayamdipta,
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Schotanus-Dijkstra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. H. C.</given-names>
            <surname>Drossaert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. E. K.</given-names>
            <surname>Lo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Beltagy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Downey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. A.</given-names>
            <surname>Smith</surname>
          </string-name>
          ,
          <string-name>
            <surname>Don't Pieterse</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Boon</surname>
            ,
            <given-names>J. A.</given-names>
          </string-name>
          <string-name>
            <surname>Walburg</surname>
          </string-name>
          , E. T. Bohlmeijer, stop pretraining:
          <article-title>Adapt language models to doAn early intervention to promote well-being and mains and tasks</article-title>
          , arXiv:
          <year>2004</year>
          .10964 [cs] (
          <year>2020</year>
          ).
          <article-title>URL: lfourishing and reduce anxiety and depression: A http://arxiv</article-title>
          .org/abs/
          <year>2004</year>
          .10964. randomized controlled trial, Internet Interventions [12]
          <string-name>
            <given-names>B.</given-names>
            <surname>Plank</surname>
          </string-name>
          , Domain Adaptation for Parsing,
          <source>Ph.D. the9</source>
          (
          <year>2017</year>
          )
          <fpage>15</fpage>
          -
          <lpage>24</lpage>
          . URL: https://www.sciencedirect. sis, University of Groningen,
          <year>2011</year>
          . com/science/article/pii/S2214782916300288. doi:10. [13]
          <string-name>
            <given-names>J.</given-names>
            <surname>Baumgartner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zannettou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Keegan</surname>
          </string-name>
          , M. Squire,
          <volume>1016</volume>
          /j.invent.
          <year>2017</year>
          .
          <volume>04</volume>
          .002.
          <string-name>
            <surname>J. Blackburn</surname>
          </string-name>
          ,
          <article-title>The Pushshift Reddit dataset</article-title>
          , in:
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>P. D.</given-names>
            <surname>McGorry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Mei</surname>
          </string-name>
          ,
          <article-title>Early intervention in Proceedings of the international AAAI conference youth mental health: Progress and future directions, on web and social media</article-title>
          , volume
          <volume>14</volume>
          ,
          <year>2020</year>
          , pp.
          <fpage>830</fpage>
          -
          <string-name>
            <surname>Evidence-Based Mental</surname>
          </string-name>
          health
          <volume>21</volume>
          (
          <year>2018</year>
          )
          <fpage>182</fpage>
          -
          <lpage>184</lpage>
          .
          <fpage>839</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>J.</given-names>
            <surname>Sánchez-Monedero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Dencik</surname>
          </string-name>
          , L. Edwards,
          <article-title>What does it mean to 'solve' the problem of discrimina-</article-title>
          [26]
          <string-name>
            <given-names>D. E.</given-names>
            <surname>Losada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Crestani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Parapar</surname>
          </string-name>
          ,
          <article-title>Overview of tion in hiring? Social, technical and legal perspec- eRisk: Early risk prediction on the Internet, in: tives from the UK on automated hiring systems</article-title>
          ,
          <source>in: International Conference of the Cross-Language Proceedings of the 2020 Conference on Fairness, Evaluation Forum for European Languages</source>
          ,
          <year>2018</year>
          , Accountability, and
          <string-name>
            <surname>Transparency</surname>
          </string-name>
          ,
          <year>2020</year>
          , pp.
          <fpage>458</fpage>
          - pp.
          <fpage>343</fpage>
          -
          <lpage>361</lpage>
          .
          <fpage>468</fpage>
          . [27]
          <string-name>
            <given-names>D. E.</given-names>
            <surname>Losada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Crestani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Parapar</surname>
          </string-name>
          , Overview of
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>L.</given-names>
            <surname>Quillian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Pager</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Hexel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. H.</given-names>
            <surname>Midtbøen</surname>
          </string-name>
          , eRisk
          <year>2019</year>
          :
          <article-title>Early risk prediction on the Internet, Meta-analysis of field experiments shows no change in: International Conference of the Cross-Language in racial discrimination in hiring over time</article-title>
          , Pro- Evaluation
          <source>Forum for European Languages</source>
          ,
          <year>2019</year>
          ,
          <source>ceedings of the National Academy of Sciences 114</source>
          pp.
          <fpage>340</fpage>
          -
          <lpage>357</lpage>
          . (
          <year>2017</year>
          )
          <fpage>10870</fpage>
          -
          <lpage>10875</lpage>
          . [28]
          <string-name>
            <given-names>D. E.</given-names>
            <surname>Losada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Crestani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Parapar</surname>
          </string-name>
          , Overview of
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>O. F.</given-names>
            <surname>Wahl</surname>
          </string-name>
          ,
          <article-title>Stigma as a barrier to recovery from eRisk 2020: Early risk prediction on the Internet, in: mental illness</article-title>
          ,
          <source>Trends in Cognitive Sciences 16 Experimental IR Meets Multilinguality</source>
          ,
          <string-name>
            <surname>Multimodal</surname>
          </string-name>
          (
          <year>2012</year>
          )
          <fpage>9</fpage>
          -
          <lpage>10</lpage>
          . ity, and
          <source>Interaction Proceedings of the Eleventh</source>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>C.</given-names>
            <surname>Henderson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Evans-Lacko</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Thornicroft, Men- International Conference of the CLEF Association tal illness stigma, help seeking, and public health (CLEF</article-title>
          <year>2020</year>
          ),
          <year>2020</year>
          . programs,
          <source>American Journal of Public Health</source>
          <volume>103</volume>
          [29]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ji</surname>
          </string-name>
          , Private Correspondence,
          <year>2022</year>
          . (
          <year>2013</year>
          )
          <fpage>777</fpage>
          -
          <lpage>780</lpage>
          . [30]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Cui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jia</surname>
          </string-name>
          , T.-
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Belongie</surname>
          </string-name>
          , Class-
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [18]
          <string-name>
            <surname>I. Shatz</surname>
          </string-name>
          , Fast, Free, and
          <article-title>Targeted: Reddit as a Source balanced loss based on efective number of samples, for Recruiting Participants Online</article-title>
          ,
          <source>Social Science in: Proceedings of the IEEE/CVF Conference on Computer Review</source>
          <volume>35</volume>
          (
          <year>2017</year>
          )
          <fpage>537</fpage>
          -
          <lpage>549</lpage>
          . Computer Vision and Pattern Recognition,
          <year>2019</year>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>S.</given-names>
            <surname>Amir</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dredze</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. W.</given-names>
            <surname>Ayers</surname>
          </string-name>
          , Mental health pp.
          <fpage>9268</fpage>
          -
          <lpage>9277</lpage>
          .
          <article-title>surveillance over social media with digital cohorts</article-title>
          , [31]
          <string-name>
            <surname>T.-Y. Lin</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Goyal</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Girshick</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>He</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Dollár</surname>
          </string-name>
          , Foin: Proceedings of the Sixth Workshop on Compu-
          <article-title>cal loss for dense object detection</article-title>
          ,
          <source>arXiv:1708.02002 tational Linguistics and Clinical Psychology</source>
          ,
          <year>2019</year>
          , [cs] (
          <year>2018</year>
          ).
          <source>arXiv:1708</source>
          .
          <year>02002</year>
          . pp.
          <fpage>114</fpage>
          -
          <lpage>120</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>C.</given-names>
            <surname>Aguirre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Harrigian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dredze</surname>
          </string-name>
          ,
          <article-title>Gender and racial fairness in depression research using social media</article-title>
          ,
          <source>in: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>2932</fpage>
          -
          <lpage>2949</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>S.</given-names>
            <surname>Merity</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Xiong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bradbury</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Socher</surname>
          </string-name>
          ,
          <article-title>Pointer sentinel mixture models</article-title>
          ,
          <source>arXiv:1609</source>
          .07843 [cs] (
          <year>2016</year>
          ). arXiv:
          <volume>1609</volume>
          .
          <fpage>07843</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>B.</given-names>
            <surname>Plank</surname>
          </string-name>
          ,
          <article-title>What to do about non-standard (or non-canonical) language in nlp (</article-title>
          <year>2016</year>
          ). arXiv:
          <volume>1608</volume>
          .
          <fpage>07836</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>R.</given-names>
            <surname>Sennrich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Haddow</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. Birch,</surname>
          </string-name>
          <article-title>Neural machine translation of rare words with subword unit</article-title>
          ,
          <source>in: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>1715</fpage>
          -
          <lpage>1725</lpage>
          . URL: https://www. aclweb.org/anthology/P16-1162. doi:
          <volume>10</volume>
          .18653/ v1/
          <fpage>P16</fpage>
          -1162.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaswani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Parmar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Uszkoreit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Gomez</surname>
          </string-name>
          , L. u. Kaiser,
          <string-name>
            <surname>I. Polosukhin</surname>
          </string-name>
          ,
          <article-title>Attention is all you need</article-title>
          , in: I. Guyon,
          <string-name>
            <given-names>U. V.</given-names>
            <surname>Luxburg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bengio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wallach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Fergus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Vishwanathan</surname>
          </string-name>
          , R. Garnett (Eds.),
          <source>Advances in Neural Information Processing Systems</source>
          , volume
          <volume>30</volume>
          ,
          <year>2017</year>
          . URL: https://proceedings.neurips.cc/paper/2017/lfie/ 3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>D. P.</given-names>
            <surname>Kingma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ba</surname>
          </string-name>
          ,
          <article-title>Adam: A method for stochastic optimization</article-title>
          ,
          <source>arXiv:1412.6980</source>
          (
          <year>2014</year>
          ). arXiv:
          <volume>1412</volume>
          .
          <fpage>6980</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>