<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Word Embeddings and Linguistic Metadata at the CLEF 2018 Tasks for Early Detection of Depression and Anorexia</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Marcel Trotzek</string-name>
          <email>mtrotzek@stud.fh-dortmund.de</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sven Koitka</string-name>
          <email>sven.koitka@fh-dortmund.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Christoph M. Friedrich</string-name>
          <email>christoph.friedrich@fh-dortmund.de</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Diagnostic and Interventional Radiology and Neuroradiology, University Hospital Essen</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Institute for Medical Informatics, Biometry and Epidemiology (IMIBE), University Hospital Essen</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>TU Dortmund University Department of Computer Science</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>University of Applied Sciences and Arts Dortmund (FHDO) Department of Computer Science Emil-Figge-Str.</institution>
          <addr-line>42, 44227 Dortmund</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Developing methods for the early detection of mental disorders like depression and anorexia based on written text has become an important aspect with the rise of social media platforms. The CLEF 2018 eRisk shared task consists of two subtasks focussed on the detection of these two disorders and FHDO Biomedical Computer Science Group (BCSG) has submitted results obtained from four machine learning models as well as from a final late fusion ensemble. This paper describes these models based on user-level linguistic metadata, Bags of Words (BoW), neural word embeddings, and Convolutional Neural Networks (CNN). BCSG has achieved top performance according to ERDE50 and F1 score in both subtasks.</p>
      </abstract>
      <kwd-group>
        <kwd>depression</kwd>
        <kwd>early detection</kwd>
        <kwd>linguistic metadata</kwd>
        <kwd>convolutional neural networks</kwd>
        <kwd>word embeddings</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        This paper describes the participation of FHDO Biomedical Computer Science
Group (BCSG) at the Conference and Labs of the Evaluation Forum (CLEF)
2018 eRisk task for early detection of depression and anorexia [
        <xref ref-type="bibr" rid="ref11 ref13">11, 13</xref>
        ]. BCSG
submitted results obtained from four different models and a late fusion ensemble
of three of these models. These models as well as the findings concerning the
dataset are described in this paper and an outlook on possible improvements
and future research is given. The work described in this paper is based on this
team’s previous participation in the eRisk 2017 pilot task for early detection of
depression [27] and on further research based on the same dataset [28].
      </p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        Studies concerning the effect of mental state on the language used by a person
have already shown various connections, beginning with observations of more
frequent uses of first personal singular pronouns in spoken language of depression
patients [
        <xref ref-type="bibr" rid="ref4">4, 29</xref>
        ]. More recent studies found, for example, an elevated use of the
word “I” in particular and more negative emotion words in essays by depressed
college students [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], more verbs in past tense and pronouns in general spoken by
Russian depression patients [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ], and a more frequent use of absolutist words (e.g.
absolutely, completely, every, nothing) in forums related to depression, anxiety,
or suicidal ideation than in unrelated forums or forums about asthma, diabetes,
and cancer [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>
        Results like these have lead to the development of tools that allow researchers
and therapists to evaluate written texts with a focus on the author’s mental state.
One such tool is the Linguistic Inquiry and Word Count (LIWC) software [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ],
which calculates a total of 93 features for any given text document based on a
dictionary. Similarly, Differential Language Analysis Toolkit (DLATK) [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] was
published as an open-source Python library for text analysis with a focus on
psychology, health, and social aspects.
      </p>
      <p>
        First results in the area of early detection of depression based on written
social media texts have been reported as part of the eRisk 2017 pilot task [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
Similar research without the early detection aspect has previously been done, for
example, at the CLPsych shared task for detection of depression and PTSD on
Twitter [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. In the same domain as this task, data from reddit.com has recently
been utilized to successfully detect messages concerning anxiety [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ].
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Datasets and Tasks</title>
      <p>Similar to the task in 2017, the datasets of both subtasks consist of messages
obtained from the social media platform reddit.com. The training data of the
depression subtask is equivalent to the full training and test data of this previous
task, while the anorexia subtask is based on completely new messages. An
especially interesting aspect of reddit is that it allows users to create communities
with specific topics called subreddits. There exists a wide variety of these
communities, also including very active ones from a depression detection perspective,
like /r/depression5, which is mainly used by people struggling with depression.</p>
      <p>The messages contained in both datasets can consist of a separate title and
text field depending on the type of message: Users can post content in terms of
links or images (only title, link or image not included), text content (title and
optional text), or as comment on another message (only text). Some messages
in both datasets also include no text or title and can therefore be discarded.
The number of documents per user ranges between 10 and 2000. In every week
of the test phase, a chunk of 10% of each user’s messages is supplied to the
5 http://www.reddit.com/r/depression, Accessed on 2018-04-02
participants in chronological order, resulting in 1 to 200 documents per user
each week. In both subtasks there are exceptions of this general rule because
one anorexia training user (subject2167 of the control group), three depression
test users (subject5161, subject5301, and subject8719), and two anorexia test
users (subject4169 and subject7483) do not have any messages in the final week,
resulting in only 9 messages of these users.</p>
      <p>
        Table 1 displays the main characteristics of the two datasets. The average
amount of characters and unigrams per document was calculated based on a
concatenation of the text and title field. To calculate the number of unigrams,
the same preprocessing and tokenization as described in sections 4.3 and 4.4 was
utilized, retaining only words that occur in the writings of at least two users.
The participation of this team in the eRisk 2017 pilot task was based on a set
of user-level linguistic metadata features that were used as additional input for
every model. In this second eRisk shared task, only one submitted model (see
section 4.1) and the final late fusion ensemble (see section 4.5) use metadata
features. All text based features have again been calculated based on a
concatenation of the text and title field of each message. Still, this includes the same set
of features described in the previous working notes paper [27] and an additional
set of ten features obtained from the Linguistic Inquiry and Word Count (LIWC)
[
        <xref ref-type="bibr" rid="ref26">26</xref>
        ] software. These LIWC features have been chosen based on their correlation
with the class label in the depression subtask training data. Another addition to
the original feature set is the average length of the title field that was also not
used in 2017.
      </p>
      <p>Figure 1 illustrates the correlation matrix of the complete metadata feature
set and includes the class label information to indicate the relevance of each
feature. Although some features—especially the pronoun counts—seem redundant
at first sight, all of the original features are preserved as they are based on a
Part of Speech (POS) tagging using the Python NLTK framework6 while LIWC
features are based on a lexicon that also includes abbreviations or common
misspellings. Most of the described features are averaged over all documents per
user to obtain the final metadata feature vector, except for the counts of
specific phrases like medication names or mentioned diagnoses which are summed.
Finally, all averaged features are standardized to have unit variance and a mean
of 0 and the summed features are converted to flags with a value of 1 for users
that have used such a phrase in any document and -1 otherwise.
trsvsneeeb irssvsnepeooun lrssaonponoun itttxeen liittee tngh itseaonanm isso itrseap itxye irsseon teghe ircsonodw i...rrssepngpno litcya ittne it tsene irvcssepoe sun lrsnpaonuon
tsaP ssoP reP I""h It""hn ltxeeT tohnM LFW FER DRC FOG icedM iganD tyhM yanM ydepM illtnT xLe ts1 nA cuhA lcunonaF rcsouopnF rsebV itnogC ronoP rseoP lssaC
s
e
tr
u
a
e
f
7
1
0
2
k
s
i
R
e
s
e
tr
u
a
e
f
C
W
I
L
h
c
e
e
p
S
tfr-o
s
a
P
y
ilit
b
a
d
a
e
R
s
e
s
a
r
h
P</p>
      <p>Past tense verbs
Possessive pronouns 0.94</p>
      <p>Personal pronouns 0.91 0.96
"I" in the text 0.88 0.91 0.96
"I" in the title 0.13 0.16 0.18 0.21
Text length 0.93 0.96 0.97 0.92 0.13</p>
      <p>Month 0.03 0.03 0.01 0.02-0.010.01
LWF 0.18 0.27 0.3 0.29-0.040.36 0.06
FRE 0.09 0.16 0.18 0.21-0.030.17 0.06 0.47
DCR 0.12 0.16 0.16 0.17-0.130.21 0.02 0.64 0.55</p>
      <p>FOG 0.15 0.2 0.22 0.22-0.090.26 0.05 0.76 0.47 0.85
Medication names 0.03 0.07 0.1 0.13 0.14 0.08 0.04 0.18 0.16 0.13 0.14</p>
      <p>Diagnosis 0.03 0.05 0.07 0.11 0.12 0.06 0.02 0.06 0.09 0.03 0.03 0.22
My therapist 0.01 0.05 0.06 0.09 0.07 0.04 0.03 0.09 0.02 0.07 0.09 0.21 0.19</p>
      <p>My anxiety 0.07 0.11 0.11 0.15 0.11 0.1 -0.01 0.1 0.05 0.1 0.08 0.26 0.12 0.37
My depression 0.02 0.07 0.09 0.12 0.11 0.07 0.05 0.15 0.18 0.11 0.13 0.44 0.34 0.38 0.32</p>
      <p>Title length -0.09-0.13-0.13-0.140.22-0.14-0.03-0.45 -0.6 -0.73-0.67-0.09-0.04-0.09-0.08-0.12</p>
      <p>Lexicon words 0.16 0.23 0.25 0.3 0.14 0.22 0.04 0.48 0.63 0.44 0.43 0.22 0.16 0.18 0.17 0.25-0.44
1st pers. sing. pron. 0.21 0.29 0.32 0.41 0.3 0.25 0.07 0.33 0.54 0.39 0.41 0.25 0.17 0.23 0.23 0.32-0.490.67</p>
      <p>Analytic -0.15-0.23-0.28-0.34-0.19-0.21-0.06-0.43-0.67 -0.5 -0.51-0.24-0.18-0.23 -0.2 -0.310.61 -0.8 -0.82
Authentic 0.19 0.24 0.28 0.39 0.19 0.25 0.09 0.39 0.45 0.22 0.28 0.27 0.17 0.2 0.19 0.3 -0.3 0.66 0.73 -0.6</p>
      <p>Functional 0.19 0.26 0.3 0.34 0.15 0.28 0.05 0.62 0.67 0.56 0.52 0.24 0.15 0.17 0.17 0.25-0.490.92 0.68-0.820.67
Focus on present 0.11 0.17 0.22 0.27 0.2 0.18 0.05 0.38 0.6 0.44 0.42 0.22 0.15 0.2 0.17 0.28-0.490.77 0.65-0.820.49 0.76</p>
      <p>Verbs 0.2 0.25 0.29 0.35 0.19 0.25 0.06 0.47 0.66 0.5 0.49 0.22 0.15 0.19 0.17 0.27-0.530.86 0.73-0.860.62 0.86 0.92
Cognitive process 0.19 0.25 0.3 0.34 0.08 0.29 0.04 0.58 0.52 0.45 0.43 0.21 0.12 0.17 0.15 0.24 -0.4 0.77 0.49 -0.7 0.57 0.81 0.67 0.74</p>
      <p>Pronouns 0.18 0.26 0.29 0.34 0.22 0.23 0.05 0.4 0.66 0.52 0.5 0.21 0.17 0.21 0.2 0.29-0.590.82 0.87-0.940.61 0.84 0.81 0.87 0.63
Personal pronouns 0.2 0.29 0.32 0.37 0.25 0.25 0.05 0.36 0.61 0.44 0.45 0.23 0.18 0.22 0.21 0.32-0.550.76 0.9 -0.9 0.61 0.76 0.76 0.81 0.54 0.95</p>
      <p>Class 0.04 0.11 0.15 0.19 0.05 0.11 0.04 0.26 0.25 0.2 0.23 0.42 0.31 0.33 0.22 0.54-0.210.35 0.44-0.420.39 0.34 0.36 0.36 0.34 0.38 0.41
1
0.8
0.6
0.4
0.2
0
-0.2
-0.4
-0.6
-0.8
-1
6 http://www.nltk.org/book/ch05.html, accessed on 2018-04-02</p>
    </sec>
    <sec id="sec-4">
      <title>Chosen Models</title>
      <p>This section describes the five models that have been used to classify the test
users of both subtasks. The models for both tasks are completely identical, use
the same set of metadata features, and only vary slightly in their prediction
thresholds as described below. In comparison to this team’s participation in the
eRisk 2017 pilot task, the prediction thresholds were simplified: For each model,
only a single prediction threshold value was chosen based on cross-validation on
the training data to indicate whether a subject is classified as depressed. The
number of documents already processed for a user is not used anymore as the
new models are less prone to predict many false positives after processing only
few documents. In addition, non-depressed predictions are now only submitted
in the final week because early prediction of these cases has no effect on the
score and later writings might still identify them as depressed. Selecting viable
prediction thresholds is difficult as a balanced result according to both ERDEo
and F1 is often hard to achieve. The goal for this participation was to use rather
low thresholds to find depressed cases as early as possible without generating
too many false positives.</p>
      <p>
        In contrast to the previous participation of this team, only the first model and
the final ensemble utilize the updated set of user metadata features described in
section 3.1. The bag of words model, which achieved the best overall F1 as well as
second best ERDE5 and ERDE50 score in the previous task [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], is reused with
and without metadata features. The Recurrent Neural Network (RNN) using a
Long Short Term Memory (LSTM) layer was not evaluated again and instead
replaced with a Convolutional Neural Network (CNN). This decision was based
on further research using the eRisk 2017 dataset [28], which showed that the
CNN model was able to outperform results of the LSTM models and also easier
to configure and less prone to overfitting.
4.1
      </p>
      <p>
        Bag of Words Metadata Ensemble - BCSGA
The first model is mostly equivalent to the first model used in this team’s
participation in eRisk 2017, except for the extended set of metadata features. It utilizes
an ensemble of Bag of Words (BoW) classifiers with different term weightings
and n-grams that are calculated on a user basis by first concatenating all
documents (text and title) of a user. The term weighting for bags of words can
generally be split into three components: a term frequency component or local
weight, a document frequency component or global weight, and a normalization
component [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]. A general term weighting scheme can therefore be given as [30]:
tt,d = lt,d · gt · nd ,
(1)
where tt,d is the calculated weight for term t in document d, lt,d is the local weight
of term t in document d, gt is the global weight of term t for all documents, and nd
is the normalization factor for document d. A common example would be using
the term frequency (tf ) as local weight and the inverse document frequency (idf )
as global weight, resulting in tf -idf weighting [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ].
      </p>
      <p>All ensemble models use l2-norm for nd but varying local and global weights.
The first one uses a combination of uni-, bi-, tri-, and 4-grams obtained from
the training data. To build this first BoW, the 200,000 {1, 2, 3, 4}-grams with
the highest Information Gain (IG) are selected, given by [14, p. 272]:
I(U, C) =</p>
      <p>X</p>
      <p>X
et∈{0,1} ec∈{0,1}</p>
      <p>P (U = et, C = ec) · log2</p>
      <p>P (U = et, C = ec)
P (U = et) · P (C = ec)</p>
      <p>,
(2)
with the random variable U taking values et = 1 (the document contains term
t) and et = 0 (the document does not contain term t) and the random variable
C taking values ec = 1 (the document is in class c) and ec = 0 (the document is
not in class c). The raw term frequency of the resulting n-grams is used as local
weight, while their IG-score is used as global weight. The second BoW utilizes a
modified version of tf , namely augmented term frequency (atf ) [30], multiplied
by idf :
atf -idf (t, d) =
a + (1 − a)</p>
      <p>tft
max(tf )
· log</p>
      <p>
        nd
df (d, t)
(3)
with max(tf ) being the maximum frequency of any term in the document, the
total number of documents nd, and the smoothing parameter a, which is set to 0.3
for this model. This BoW, as well as the third one, contains all unigrams of the
training corpus. The local weight of the third model consists of the logarithmic
term frequency (logtf ) [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] and the global weight is given by relevance frequency
(rf ) [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], which can be combined as:
logtf -rf (t, d) = (1 + log(tf )) · log2
2 +
      </p>
      <p>dft,+
max (1, dft,−)
where dft,+ and dft,− is the number of documents in the depressed/non-depressed
class that contain the term t. The final model of this ensemble uses the
handcrafted user features described in section 3.1.</p>
      <p>
        All three bags of words and the hand-crafted features were each used as
input for a separate logistic regression classifier. Due to the imbalanced class
distribution, a modified class weight was used for these classifiers similar to the
original task paper [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] to increase the cost of false negatives. It was calculated
for the non-depressed class as 1/(1 + w) and for the depressed class as w/(1 + w),
with w = 2 for all four models. The final output probabilities were calculated as
unweighted mean of all four logistic regression probabilities. Each week and for
both tasks, this ensemble predicted any user with a probability above or equal
to 0.4 as depressed, while in the final week all users with a probability less than
0.4 were predicted as non-depressed.
4.2
      </p>
      <p>Bag of Words Ensemble - BCSGB
The second model is similar to the first one, but it only includes the three
bags of words in the ensemble and disregards the metadata features. Again, for
the depression subtask any test subject with a probability of at least 0.4 was
predicted as depressed, while users with a probability below 0.4 were predicted
as non-depressed in the final week. The prediction threshold for the anorexia
subtask was set to 0.3 in this case.
4.3</p>
      <p>
        CNN with GloVe Embeddings - BCSGC
The third model consists of a Convolutional Neural Network (CNN) [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], which
have previously been utilized by many recent studies to achieve outstanding
results especially in the area of image classification and are generally viable for
data with a grid-like structure [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. The implementation has been done based on
Tensorflow [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and the input of this CNN is based on GloVe [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] word
embeddings: A 50-dimensional set of word embeddings pre-trained on Wikipedia and
News7 is used to produce a matrix of word vectors for the first 100 words of
each document in the dataset. Prior to this vectorization, the documents are
preprocessed and tokenized in a way that preserves, for example, emoticons,
punctuation, words including special characters, and generally all tokens that
occur in the documents of at least two users. Zero-padding is used for
documents with less than 100 words. Each document is therefore represented by a
100 × 50 matrix and is classified independently. Since the number of words per
document in the training data ranges between 1 (when ignoring the empty
documents) and 6,487 but has a mean of 34.58 according to the tokenization done for
this work, the limitation to 100 words (or even fewer to minimize the necessary
zero-padding) is viable.
      </p>
      <p>convolution
document
100x50/300</p>
      <p>CReLU
activation
100 filters
2x300</p>
      <p>
        The text classification network architecture used for this work is displayed
in Figure 2, which shows the use of 300 dimensional word vectors (and therefore
100 × 300 documents) as used for the next model BCSGD. It is similar to the
7 http://nlp.stanford.edu/projects/glove, Accessed on 2018-03-30
one-layer CNN for sentence classification described by Zhang and Wallace [31]
and consists of only a single convolutional layer, 100 filters with an equal height
of 2 and a width corresponding to the word embedding dimensions, and uses
1-max pooling to extract a single value from each filter. Due to the usage of
Concatenated Rectified Linear Units (CReLU) [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] activation, this finally results
in a 200-dimensional vector per document that is propagated through four fully
connected layers, of which the first applies dropout to its output and the final
one applies softmax. The training steps of this and the following CNN model
utilized Adam [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] to minimize the cross-entropy loss. Both models were trained
using a learning rate of 1e−4 and a batch size of 10.000 documents. BCSGC was
trained for 30 epochs.
      </p>
      <p>To obtain a final prediction per user, the 98th percentile of the outputs
from all the user’s documents is calculated. This ensures that even depressed
users that have very few documents with a high probability can be correctly
predicted. For both subtasks, any subject with a final probability of at least
0.4 was predicted as depressed in each week, while probabilities below 0.4 again
resulted in a non-depressed prediction in the final week.
4.4</p>
      <p>
        CNN with fastText Embeddings - BCSGD
The second CNN model is based on the same architecture as the previous one but
utilizes 300-dimensional fastText [
        <xref ref-type="bibr" rid="ref15 ref3 ref7">7, 3, 15</xref>
        ] word embeddings. To evaluate word
vectors that are more related to the domain of reddit messages or social media in
general, a new fastText model was trained specifically for this task. A dataset of
all 1.7 billion reddit comments written between October 2007 and May 20158 was
used as training corpus for this model and preprocessed similar to the description
in section 4.3 but without removing infrequent words yet. In addition to this,
any references to reddit users (in the form of /u/&lt;username&gt;) were replaced
by a generic phrase “ref_user” to prevent any connections to actual users in the
resulting word embeddings. Similarly, any reference to a subreddit (in the form
of /r/&lt;subreddit&gt;) was replaced by the phrase “ref_subreddit_&lt;subreddit&gt;”
to be able to learn a vector representation of them as well that can be regarded as
their topic. No stemming or stopword removal of any kind was done and messages
in other languages than English were removed based on stopword counts. The
final corpus of 1.37 billion reddit comments was used to train 6 million word
vectors of words that occur at least five times in the corpus. Additional details
about this model and the utilized CNNs can be found in the corresponding paper
[28].
      </p>
      <p>Similar to the previous CNN model, the resulting 100 × 300 matrix of word
embeddings obtained for each document was classified separately and the 98th
percentile of the outputs was used as output for the corresponding user. This
model was trained for 25 epochs using the same parameters as BCSGC. The
prediction threshold for depressed predictions was set to 0.7 for both tasks,
8 https://www.reddit.com/r/datasets/comments/3bxlg7/i_have_every_publicly
_available_reddit_comment/, Accessed on 2018-03-30
leading to a non-depressed prediction for probabilities below 0.7 in the final
week.</p>
      <p>
        CNN and Bag of Words Metadata Ensemble - BCSGE
The final model consists of a simple late fusion ensemble that has been calculated
as the unweighted mean of the outputs obtained from models BCSGA, BCSGC,
and BCSGD - the bag of words including metadata and the two CNN models.
Although these outputs have not been calibrated (e.g. by using Platt scaling [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ])
and can therefore not be seen as directly comparable probabilities, previous
experiments [28] have shown that such an ensemble was able to improve the
results of the separate models. Again, a prediction threshold of at least 0.4 was
used for the depression detection subtask, while a threshold of 0.5 was utilized
for the anorexia subtask.
5
      </p>
    </sec>
    <sec id="sec-5">
      <title>Results</title>
      <p>Before examining the results of the described models in the two subtasks, it is
necessary to analyze the utilized ERDEo metric for early detection systems.
Since this metric is based on the absolute number of documents read per user
before a true positive prediction, but these documents have to be read in ten
equally sized chunks, the score is highly dependent on the number of documents
available per user. Because at least 10% of each user’s documents have to be read
by all participants, it is impossible to predict some users correctly depending on
the parameter o that describes after how many documents the penalty for late
predictions grows. This fact has already been described in more detail in another
paper [28].</p>
      <p>Table 2 displays the best ERDE5 and ERDE50 scores that are possible for
the test data of the depression and anorexia subtask. These results are based
on a perfect prediction in the first week of the tasks. As described in the
abovementioned paper, only test users with less than 100 documents (less than 10 per
chunk) have any effect on the ERDE5 score. This means that only predicting 26
of the 79 depressed test users correctly in the first week and ignoring all others
still leads to an ERDE5 score of 7.78 (F1 = 0.50), while predicting only 12 of
the 41 anorexia users in the first week also leads to an ERDE5 score of 10.23
(F1 = 0.45). ERDE5 alone, without the additional F1 score, is therefore hard
to interpret.</p>
      <p>To examine the weekly predictions obtained from the described models,
Figures 3 and 4 show the cumulative number of positive predictions for the two
subtasks and also visualize the proportion of true positives. For the depression
subtask, this shows that the ensemble indeed lead to the most true positives
but also many false positives. BCSGD seems to perform worse at first sight
but indeed achieved a good balance between true and false positives because
of its higher prediction threshold. As the comparison of both figures shows, the
sn140
ito130
c
ide120
rp110
ive100
itso 90
)p 80
e
tru 70
(
fo 60
re 50
bm40
nu 30
itve 20
lua 10
um 0
C
anorexia subtask was much easier using the same models and that it was
possible to detect nearly all positive samples without too many false positives. Both
examinations show a steady progression over the ten weeks for all models.
1
2
3
4
5
6
7
8
9</p>
      <p>10</p>
      <sec id="sec-5-1">
        <title>Week</title>
        <p>
          Tables 3 and 4 show the official results [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] of BCSG’s models for both
subtasks and also include the alternative early detection scores Flatency [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ] and
ERDEo% [28]. According to the suggestion in the paper, Flatency was calculated
using a value for the parameter p that fits the true positive cost function Platency
to return a cost of 0.5 for the median number of documents of the positive test
users. This results in a value of p = 0.0051 for the depression subtask (median of
216 documents per depressed test user) and p = 0.0042 for the anorexia subtask
(median of 260 documents per anorexia test user). In contrast to the standard
ERDEo score, ERDEo% is calculated based on the percentage of read documents
per user and is therefore easier to interpret in a chunk-based task. Additional
results by other teams have been added to these tables to include at least the
best two results obtained for each score.
        </p>
        <p>While the direct comparison of BCSGA (bags of words with linguistic
metadata) and BCSGB (bags of words only) shows that the metadata features result
in more positive predictions, the actual amount of true positives was only
better for the depression subtask and resulted in a better ERDE5 score but worse
s
n
o
iitred50
c
p
e
v
iit40
s
o
p
)
e
ru30
tf(
o
re20
b
m
u
n
iltve10
a
u
um0
C
1
2
3
4
5</p>
        <p>6</p>
      </sec>
      <sec id="sec-5-2">
        <title>Week</title>
        <p>7
8
9
10</p>
        <sec id="sec-5-2-1">
          <title>LIIRB</title>
          <p>UNSLA
UNSLD
UNSLE
9.21
9.50
9.58
9.46
9.52
ERDE50 and F1. Similar to the task in 2017, the bag of words ensemble again
obtained the best results in the depression subtask, while the CNN based on
the self-trained fastText embeddings (BCSGD) and the ensemble using both the
bags of words as well as the CNNs (BCSGE) achieved the best scores in the
anorexia subtask. Overall, the models of BCSG achieved the second-best results
in ERDE5 and the best results in all other scores except for another second-best
result according to ERDE5%0 in the depression subtask.</p>
          <p>As already described, the ERDEo score and especially ERDE5 should be
discussed in more detail because of the fact that optimizing it can often lead
to simply minimizing false positives by only predicting very few users at all.
A detailed look at the results and the achieved ERDE5 scores shows that, for
example, in the first week of the depression subtask both UNSLA and BCSGA
have predicted 45 users as depressed of which 20 were indeed true positives. Still,</p>
        </sec>
        <sec id="sec-5-2-2">
          <title>LIIRA</title>
          <p>PEIMEXB
RKMVERIA
UNSLB
UNSLD
the resulting ERDE5 score differs drastically because the predicted users vary
in the number of total documents and even though UNSLA only had five more
true positives in the following nine weeks, while BCSGA already had ten more
in the second week and a total of 53 in the end. Similarly, the leading model
in ERDE5 of the anorexia subtask, UNSLB, had 19 true positives in the first
week, while BCSGD already had 22, BCSGB had 21, and BCSGA had 20. In
summary, ERDE5 produces highly misleading results because of the varying
number of documents per user.
6</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Conclusions</title>
      <p>Again, the eRisk competition has been a challenging task concerning the early
detection of mental health issues based on sequences of social media texts. The
depression subtask had similar F1 scores but much better ERDEo scores based
on a test set that was nearly as large as last year’s training and test set combined.
The results of the anorexia subtask were surprisingly good, which probably is
due to the nature of this dataset. Generally, the promising results with the test
data of eRisk 2017 obtained only based on linguistic metadata [28] could not
yet be confirmed in this year’s tasks. As already concluded in the same paper,
finding a way to successfully integrate the metadata features into the neural
network models is an interesting task for future research.</p>
      <p>The examination of the task results again shows that a discussion about a
meaningful metric should be a priority in the future. Both Flatency and ERDEo%
include interesting ideas to improve the evaluation of early prediction models.
Flatency contains a cost function that grows less rapidly and already incorporates
the F1 score, which makes it more meaningful when viewed alone. ERDEo% is
more viable for chunk-based shared tasks because it is calculated based on the
proportion of read documents per user instead of the absolute number, which
leads to results that are better interpretable than the standard ERDEo. A
combination of these two ideas could be a promising basis for discussions about
future early detection tasks.
7</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgment</title>
      <p>The work of Sven Koitka was partially funded by a PhD grant from University
of Applied Sciences and Arts Dortmund, Germany.
27. Trotzek, M., Koitka, S., Friedrich, C.M.: Linguistic Metadata Augmented
Classifiers at the CLEF 2017 Task for Early Detection of Depression. Working Notes
Conference and Labs of the Evaluation Forum CLEF 2017, Dublin, Ireland (2017)
Available from: http://ceur-ws.org/Vol-1866/paper_54.pdf - Accessed on
2018-0329
28. Trotzek, M., Koitka, S., Friedrich, C.M.: Utilizing Neural Networks and Linguistic
Metadata for Early Detection of Depression Indications in Text Sequences. arXiv
preprint arXiv:1804.07000 [cs.CL] (2018)
29. Weintraub, W.: Verbal Behavior: Adaptation and Psychopathology. Springer
Publishing Company (1981)
30. Wu, H., Gu, X.: Reducing Over-Weighting in Supervised Term Weighting for
Sentiment Analysis. The 25th International Conference on Computational Linguistics
(COLING 2014), pp. 1322–1330, Dublin, Ireland (2014)
31. Zhang, Y., Wallace, B.: A Sensitivity Analysis of (and Practioners’ Guide to)
Convolutional Neural Networks for Sentence Classification. Proceedings of the Eighth
International Joint Conference on Natural Language Processing (Volume 1: Long
Papers). Asian Federation of Natural Language Processing, pp. 253–263, Taipei,
Taiwan (2017)</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Abadi</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barham</surname>
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            <given-names>J.</given-names>
          </string-name>
          , Chen
          <string-name>
            <given-names>Z.</given-names>
            ,
            <surname>Davis</surname>
          </string-name>
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Dean</surname>
          </string-name>
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Devin</surname>
          </string-name>
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Ghemawat</surname>
          </string-name>
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Irving</surname>
          </string-name>
          <string-name>
            <given-names>G.</given-names>
            ,
            <surname>Isard</surname>
          </string-name>
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Kudlur</surname>
          </string-name>
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Levenberg</surname>
          </string-name>
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Monga</surname>
          </string-name>
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Moore</surname>
          </string-name>
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Murray</surname>
          </string-name>
          <string-name>
            <given-names>D.G.</given-names>
            ,
            <surname>Steiner</surname>
          </string-name>
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Tucker</surname>
          </string-name>
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Vasudevan</surname>
          </string-name>
          <string-name>
            <given-names>V.</given-names>
            ,
            <surname>Warden</surname>
          </string-name>
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Wicke</surname>
          </string-name>
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Yu</surname>
          </string-name>
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Zheng</surname>
          </string-name>
          <string-name>
            <surname>X.:</surname>
          </string-name>
          <article-title>TensorFlow: A System for Large-Scale Machine Learning 12th</article-title>
          <source>USENIX Symposium on Operating Systems Design and Implementation (OSDI'16)</source>
          , pp.
          <fpage>265</fpage>
          -
          <lpage>283</lpage>
          , Savannah, Georgia, USA (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Al-Mosaiwi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Johnstone</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>In an Absolute State: Elevated Use of Absolutist Words is a Marker Specific to Anxiety, Depression, and Suicidal Ideation</article-title>
          .
          <source>Clinical Psychological Science, Prepublished January 5</source>
          ,
          <year>2018</year>
          , DOI: 10.1177/2167702617747074 (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Bojanowski</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grave</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joulin</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Enriching Word Vectors with Subword Information. Transactions of the Association for Computational Linguistics</article-title>
          , Vol.
          <volume>5</volume>
          , pp.
          <fpage>135</fpage>
          -
          <lpage>146</lpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Bucci</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Freedman</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          :
          <article-title>The Language of Depression. Bulletin of the Menninger Clinic</article-title>
          , Vol.
          <volume>45</volume>
          (
          <issue>4</issue>
          ), pp.
          <fpage>334</fpage>
          -
          <lpage>358</lpage>
          (
          <year>1981</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Coppersmith</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dredze</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Harman</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hollingshead</surname>
            , K., Mitchell,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>CLPsych 2015 Shared Task: Depression and PTSD on Twitter</article-title>
          .
          <source>Proceedings of the 2nd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality (CLPsych'15)</source>
          , pp.
          <fpage>31</fpage>
          -
          <lpage>39</lpage>
          , Denver, Colorado, USA (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Goodfellow</surname>
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bengio</surname>
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Courville</surname>
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Deep Learning</article-title>
          . MIT Press (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Joulin</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grave</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bojanowski</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Bag of Tricks for Efficient Text Classification</article-title>
          .
          <source>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics</source>
          , Vol.
          <volume>2</volume>
          , pp.
          <fpage>427</fpage>
          -
          <lpage>431</lpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Kingma</surname>
            ,
            <given-names>D.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ba</surname>
          </string-name>
          , J.:
          <article-title>Adam: A Method for Stochastic Optimization</article-title>
          .
          <source>Proceedings of the 3rd International Conference on Learning Representations (ICLR)</source>
          , San Diego, California, USA, arXiv preprint arXiv:
          <volume>1412</volume>
          .6980 (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Lan</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tan</surname>
          </string-name>
          , Chew L.,
          <string-name>
            <surname>Low</surname>
          </string-name>
          , H.-B.:
          <article-title>Proposing a New Term Weighting Scheme for Text Categorization</article-title>
          .
          <source>Proceedings of the 21st National Conference on Artifical Intelligence (AAAI-06)</source>
          , Vol.
          <volume>6</volume>
          , pp.
          <fpage>763</fpage>
          -
          <lpage>768</lpage>
          , Boston, Massachusetts, USA (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>LeCun</surname>
          </string-name>
          , Y.:
          <article-title>Generalization and Network Design Strategies</article-title>
          .
          <source>Technical Report CRGTR-89-4</source>
          , University of Toronto (
          <year>1989</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Losada</surname>
            ,
            <given-names>D.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Crestani</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>A Test Collection for Research on Depression and Language Use</article-title>
          .
          <source>Experimental IR Meets Multilinguality, Multimodality, and Interaction: 7th International Conference of the CLEF Association</source>
          , pp.
          <fpage>28</fpage>
          -
          <lpage>39</lpage>
          . CLEF 2016, Évora, Portugal (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Losada</surname>
            ,
            <given-names>D.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Crestani</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parapar</surname>
            ,
            <given-names>J.: eRISK</given-names>
          </string-name>
          <year>2017</year>
          :
          <article-title>CLEF Lab on Early Risk Prediction on the Internet: Experimental Foundations</article-title>
          .
          <source>Proceedings Conference and Labs of the Evaluation Forum CLEF</source>
          <year>2017</year>
          , Dublin, Ireland (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Losada</surname>
            ,
            <given-names>D.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Crestani</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parapar</surname>
          </string-name>
          , J.:
          <article-title>Overview of eRisk - Early Risk Prediction on the Internet Experimental IR Meets Multilinguality, Multimodality, and Interaction</article-title>
          .
          <source>Proceedings of the Ninth International Conference of the CLEF Association (CLEF</source>
          <year>2018</year>
          ), Avignon, France (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>C.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Raghavan</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schütze</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          :
          <article-title>An Introduction to Information Retrieval</article-title>
          .
          <source>Online Edition</source>
          . Cambridge University Press (
          <year>2009</year>
          ) Available from: https://nlp.stanford.edu/IR-book/pdf/irbookonlinereading.pdf - Accessed
          <source>on 2018- 04-02</source>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grave</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bojanowski</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Puhrsch</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Joulin</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Advances in Pre-Training Distributed Word Representations</article-title>
          .
          <source>Proceedings of the International Conference on Language Resources and Evaluation (LREC'18)</source>
          , Miyazaki, Japan (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Paltoglou</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thelwall</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>A Study of Information Retrieval Weighting Schemes for Sentiment Analysis</article-title>
          .
          <source>Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics</source>
          , pp.
          <fpage>1386</fpage>
          -
          <lpage>1395</lpage>
          . Association for Computational Linguistics (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Platt</surname>
            ,
            <given-names>J.C.</given-names>
          </string-name>
          :
          <article-title>Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods</article-title>
          .
          <source>Advances in Large Margin Classifiers</source>
          , Vol.
          <volume>10</volume>
          (
          <issue>3</issue>
          ), pp.
          <fpage>61</fpage>
          -
          <lpage>74</lpage>
          (
          <year>1999</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Pennington</surname>
          </string-name>
          J., Richard S., Manning C.D.:
          <article-title>GloVe: Global Vectors for Word Representation</article-title>
          .
          <source>Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP'14)</source>
          , ACL, pp.
          <fpage>1532</fpage>
          -
          <lpage>1543</lpage>
          , Doha, Qatar (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Rude</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gortner</surname>
            ,
            <given-names>E.-M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pennebaker</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Language Use of Depressed and Depression-Vulnerable College Students</article-title>
          .
          <source>Cognition &amp; Emotion</source>
          , Vol.
          <volume>18</volume>
          (
          <issue>8</issue>
          ), pp.
          <fpage>1121</fpage>
          -
          <lpage>1133</lpage>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Sadeque</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bethard</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Measuring the Latency of Depression Detection in Social Media</article-title>
          .
          <source>Proceedings of the 11th ACM International Conference on Web Search and Data Mining (WSDM'18)</source>
          , pp.
          <fpage>495</fpage>
          -
          <lpage>503</lpage>
          , Los Angeles, California, USA (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Salton</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Buckley</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Term-Weighting Approaches in Automatic Text Retrieval</article-title>
          .
          <source>Information Processing &amp; Management</source>
          , Vol.
          <volume>24</volume>
          (
          <issue>5</issue>
          ), pp.
          <fpage>513</fpage>
          -
          <lpage>523</lpage>
          (
          <year>1988</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Schwartz</surname>
            ,
            <given-names>H.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Giorgi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sap</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Crutchley</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ungar</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eichstaedt</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>DLATK: Differential Language Analysis Toolkit</article-title>
          .
          <source>Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing: System Demonstrations (EMNLP'17)</source>
          , ACL, pp.
          <fpage>55</fpage>
          -
          <lpage>60</lpage>
          , Copenhagen, Denmark (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Shang</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sohn</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Almeida</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          :
          <source>Understanding and Improving Convolutional Neural Networks via Concatenated Rectified Linear Units Proceedings of The 33rd International Conference on Machine Learning</source>
          , Vol.
          <volume>48</volume>
          , pp.
          <fpage>2217</fpage>
          -
          <lpage>2225</lpage>
          , New York City, New York, USA (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Shen</surname>
            ,
            <given-names>J.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rudzicz</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Detecting Anxiety through Reddit</article-title>
          .
          <source>Proceedings of the Fourth Workshop on Computational Linguistics and Clinical Psychology</source>
          . From Linguistic Signal to Clinical
          <source>Reality (CLPsych'17)</source>
          , pp.
          <fpage>58</fpage>
          -
          <lpage>65</lpage>
          , Vancouver, Canada (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Smirnova</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sloeva</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuvshinova</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Krasnov</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Romanov</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nosachev</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          :
          <article-title>Language Changes as an Important Psychopathological Phenomenon of Mild Depression</article-title>
          .
          <source>European Psychiatry</source>
          , Vol.
          <volume>28</volume>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>Tausczik</surname>
            ,
            <given-names>Y.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pennebaker</surname>
            ,
            <given-names>J.W.:</given-names>
          </string-name>
          <article-title>The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods</article-title>
          .
          <source>Journal of Language and Social Psychology</source>
          , Vol.
          <volume>29</volume>
          (
          <issue>1</issue>
          ), pp.
          <fpage>24</fpage>
          -
          <lpage>54</lpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>