<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Quick and (maybe not so) Easy Detection of Anorexia in Social Media Posts</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Elham Mohammadi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hessam Amini</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Leila Kosseim</string-name>
          <email>leila.kosseimg@concordia.ca</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Computational Linguistics at Concordia (CLaC) Laboratory Department of Computer Science and Software Engineering Concordia University</institution>
          ,
          <addr-line>Montreal, QC H3G 2W1</addr-line>
          ,
          <country country="CA">Canada</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper presents an ensemble approach for the early detection of anorexia in social media posts. The approach utilizes several attention-based neural sub-models to extract features and predict class probabilities, which are later used as input features to a Support Vector Machine (SVM) making the nal classi cation. The model was evaluated on the rst task of eRisk 2019, whose aim was the early detection of anorexia in Reddit posts. Our submission, named CLaC achieved F1 and latency-weighted F1 scores of 0.7073 and 0.6908 respectively, allowing it to rank rst in terms of these metrics, and achieved competitive results based on other evaluation metrics.</p>
      </abstract>
      <kwd-group>
        <kwd>Anorexia</kwd>
        <kwd>Early detection</kwd>
        <kwd>Social media</kwd>
        <kwd>Ensemble clas- si er</kwd>
        <kwd>Neural networks</kwd>
        <kwd>Support vector machine</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        In the last decade, the use of social media to express personal thoughts, emotions,
and ideas has become more and more prevalent. The analysis of online data can
be useful for many purposes, such as business and marketing, political planning,
prediction of stock market [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], as well as enhancing awareness of emergencies
[
        <xref ref-type="bibr" rid="ref34">34</xref>
        ]. Another noteworthy line of research has focused on the detection of toxicity,
hate speech, aggression and cyber bullying on online platforms, an e ort that
could facilitate timely interventions in violent situations [
        <xref ref-type="bibr" rid="ref31 ref8">8,31</xref>
        ].
      </p>
      <p>
        In healthcare applications, online posts have been used for detecting disease
outbreaks [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ], nding smoking patterns [
        <xref ref-type="bibr" rid="ref30">30</xref>
        ], and the identi cation of adverse
drug reactions [
        <xref ref-type="bibr" rid="ref33">33</xref>
        ]. Another useful application is the automatic detection of
mental health issues, a relatively recent eld which has attracted the attention of
many researchers in Natural Language Processing (NLP). Corpora from Twitter,
Facebook, blogs and online forums, and Reddit are used as resources to detect
various mental health problems, such as anxiety, depression, suicide ideation,
and eating disorders [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>Although automatically monitoring online forums to detect cases of mental
health issues is bene cial, the elapsed time between the rst signs of a mental
issue and the actual detection of a potential victim can play a crucial role. Earlier
detection of a harmful behavior can help moderators better handle the
situation. However, to the best of our knowledge, not much research has speci cally
addressed the task of early detection of mental health issues.</p>
      <p>
        The eRisk shared task [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] was created with the goal of addressing issues
related to early risk detection of mental health problems. According to [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], early
detection can be useful in many applications from the identi cation of
potential sexual o enders to the detection of victims of suicidal tendencies, making
intervention possible before it is too late. [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] argues that while current risk
assessment approaches often aim at detecting harmful behavior after the fact, it
is very important to consider the timing of risk detection and to minimize the
time between the observation of the rst evidence of destructive behavior and
triggering an alarm. To that end, the organizers of the eRisk shared task have
encouraged the development of approaches which model the process rather than
the outcome, as well as developing reliable evaluation metrics and test collections
tailored to early risk detection.
      </p>
      <p>
        The aim of this work is to propose of a model for the early detection of
anorexia and to evaluate it using the eRisk 2019 data and evaluation metrics
[
        <xref ref-type="bibr" rid="ref19">19</xref>
        ].
      </p>
      <p>The rest of the paper is organized as follows: Section 2 provides an overview
of the related literature. Section 3 consists of a brief summary of the task and
the data set used. Section 4 presents the general model architecture that has
been developed. Section 5 is dedicated to a more detailed description of model
variants that were employed for the experiments. Section 6 includes a summary
and discussion of the results. Section 7 concludes the paper and presents some
interesting future directions.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        Many researchers have used corpora from Twitter, Facebook, Reddit, blogs and
online forums as resources to experiment with classi cation tasks pertaining to
mental health issues [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        Pestian et al. [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ] experimented with di erent machine learning methods for
suicide note classi cation. The features used in the study included words, part
of speech tags, readability scores, and emotions. The best accuracy of 74% was
achieved by a logistic regression model.
      </p>
      <p>
        DeVault et al. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] studied the symptoms of psychological distress in dialogues
with a virtual agent. The use of a Nave Bayes classi er for the detection of
post-traumatic stress disorder (PTSD) and distress yielded a 20% improvement
over the baseline accuracy of 53.5%, and showed that the automatic assessment
of psychological distress is indeed possible.
      </p>
      <p>
        More recently, Jackson et al. [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] used clinical texts obtained through the
Clinical Record Interactive Search1 to extract symptoms of severe mental
illness. The authors made use of TextHunter [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] (a natural language processing
information extraction tool) and an SVM classi er, and were able to classify 38
symptoms with an F1-score of 85%.
      </p>
      <p>
        Shen and Rudzicz [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ] used di erent feature sets including word2vec
embedding, latent Dirichlet allocation topic modelling, lexico-syntactic features, and
n-grams (unigrams and bigrams) to detect anxiety in Reddit posts. Initially,
the authors compared the results achieved by an SVM and a 2-layer neural
network. Though both classi ers performed well, the SVM yielded marginally
better results. However, they achieved their best result of 98% accuracy using
the neural network with n-gram probabilities and word embeddings combined
with Linguistic Inquiry and Word Count (LIWC) features.
      </p>
      <p>
        Coppersmith et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] explored the automatic detection of post-traumatic
stress disorder (PTSD), depression, bipolar disorder, and seasonal a ective
disorder (SAD) in Twitter data, using LIWC features and character and word
n-grams, and found the latter resulting in superior performance.
      </p>
      <p>
        Benton et al. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] used multi-task learning to predict suicide risk and a variety
of mental health conditions from Twitter data, including anxiety, depression,
PTSD, and schizophrenia. It was found that a multi-task framework can be
e ectively used in cases with limited data.
      </p>
      <p>
        Apart from individual e orts, shared tasks (e.g. [
        <xref ref-type="bibr" rid="ref20 ref21 ref7">7,21,20,35</xref>
        ]) have also been
organized to encourage the development of common benchmarks (datasets and
metrics) and the comparison of approaches for the detection of distress in online
textual data.
      </p>
      <p>
        All of the previous work described above used a classic classi cation approach
that does not measure how early the detection is performed. On the other hand,
the eRisk shared tasks [
        <xref ref-type="bibr" rid="ref17 ref18 ref19">17,18,19</xref>
        ] focus on the early detection of mental health
issues. In the rst edition of eRisk [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], the data set used was a collection of social
media posts and comments from depressed and non-depressed authors, recorded
chronologically. As evaluation metric, Early Risk Detection Error (ERDE) was
used, an error measure which assigns a penalty to late decisions and rewards early
ones [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. As it was the rst edition of this shared task, many teams focused on
making accurate rather than early decisions, with the highest F1-score being
64% and the lowest ERDE50 score2 being 9.68% [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ].
      </p>
      <p>
        The second eRisk shared task [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] included two tasks: Early risk detection of
depression and early risk detection of anorexia. Like the year before, the ERDE
evaluation metric was used as the main metric alongside F1, precision, and recall
[
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. The best performing systems, in both tasks, were designed by Trotzek et al.
[
        <xref ref-type="bibr" rid="ref32">32</xref>
        ]. Their team experimented with di erent variations of bag of words features
and a Convolutional Neural Network (CNN) [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] as well as ensemble models. In
the depression task, their system achieved an F1-score of 64% and an ERDE50 of
1 https://crisnetwork.co
2 A detailed description of ERDEo, where o is either 5 or 50, can be found in [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ].
6.44%. In the anorexia task, they achieved an F1-score of 85% and an ERDE50
of 5.96%.
      </p>
      <p>
        In this work, we present an ensemble approach that can be used for the
detection of di erent types of distress in textual data. We investigate the e ectiveness
of the model by presenting and analyzing our results in the rst task of eRisk
2019 [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ].
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Task and Dataset</title>
      <p>
        Following the success of the eRisk 2018 task 2 [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], the eRisk 2019 task 1 [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]
focuses on the early detection of anorexia in online posts. The data used for
the task is a collection of Reddit users labelled as anorexic or non-anorexic [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ],
along with a collection of their Reddit posts, recorded chronologically.
      </p>
      <p>For the training phase, the data from the previous year (eRisk 2018 task 2),
including both training and test sets, was made available. For the testing phase,
posts were released on an item-by-item basis in chronological order for a new
collection of Reddit users. The goal was to detect users su ering from anorexia,
having observed as few posts from them as possible. As a result, in addition
to precision, recall, and F1-score, two other metrics were used: Early Detection
Error (ERDE) measure which penalizes late decisions, and latency-weighted F1,
a modi ed version of F1 score that takes into account the delay of the decision3.</p>
      <p>
        Table 1 shows some statistics of the datasets. As shown in the table, the
datasets are highly imbalanced, with about 90% of the users not su ering from
anorexia.
3 The details of the evaluation metrics for eRisk 2019 task 1 is explained in [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ].
As shown in Fig. 1, each sub-model includes an input layer that receives as
input the posts by a user, and vectorizes its tokens using an embedding layer.
The output of the input layer is then fed to a hidden layer, which is followed by a
post-level attention/pooling layer that creates a representation of the post from
its constituent tokens. The user-level attention layer is responsible to calculate
the vector representation of the user, using her/his online posts. Finally, the
output (classi cation) layer predicts the probability distribution of the positive
and negative classes (i.e. anorexic versus non-anorexic).
      </p>
      <p>Our main focus during the development of the sub-models was to include
diversity of information sources, so that the nal ensemble model can incorporate
di erent points of views when performing the nal classi cation.
Input Layer. The inputs to the model are the online posts of each user. Each
post is rst tokenized, and the tokens are sent to the word embedder, in order
to be converted into dense vectors. As shown in Figure 1, these token vectors
are then fed to the hidden layer.</p>
      <p>
        Two di erent pretrained word embeddings were experimented with. The rst
word embedder was the 300d version of GloVe [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ] that was pretrained on 840B
tokens of web data from Common Crawl. The second word embedder was the
original 1024d version of ELMo, which was pretrained on the 1 Billion Word
Language Model Benchmark [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. These two word embeddings were used in order
to provide our ensemble model with sub-models that utilize both contextual
(ELMo) and non-contextual (GloVe) word embedders in their input layer.
Hidden Layer. The hidden layer is responsible for processing the token vectors,
generated by the input layer. As shown in Fig. 1, we have experimented with
four hidden architectures in our sub-models: a CNN [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] that processes token
n-grams separately, and a Bidirectional Vanilla RNN (BiRNN), a Bidirectional
Long Short-Term Memory (BiLSTM) [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] and a Bidirectional Gated Recurrent
Unit (BiGRU) [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], all of which process token vectors sequentially, from rst to
last and vice-versa, by taking into account the preceding and following tokens,
respectively.
      </p>
      <p>
        Post-level Attention/Pooling Layer. Following [
        <xref ref-type="bibr" rid="ref32">32</xref>
        ], for the sub-models that
use CNN in the hidden layer, a max pooling is applied to the outputs of the
hidden layer after being passed through a Concatenated Recti ed Linear Unit
(CReLU, i.e. ReLU applied on the concatenation of each vector and its negative).
      </p>
      <p>In the models that use BiRNN, BiLSTM, or BiGRU in their hidden layer, an
attention mechanism is responsible for computing the representation of a post
(P ) by weighted-averaging over the outputs of the hidden layer for each token in
the post, where the weights assigned to each token is calculated automatically.
The function used by the attention mechanism can be shown in Equation 1:
P =
n
X yt!t
t=1
where yt represents the output of the recurrent hidden layer at time-step t,
and !t is the weight assigned to the output in that time-step.</p>
      <p>In our model, the attention mechanism uses an N -to-1 feed-forward layer
(with the weights w, where N is equal to the size of the output vectors of the
recurrent hidden layer) to map the output of the hidden layer at each time-step
(e.g. yt) to a scalar (e.g. t):
t = yt
w
(1)
(2)</p>
      <p>These scalars are then concatenated, and softmax is applied to the resulting
vector. The resulting vector from the softmax will include the weights that are
used by the attention mechanism:
! = Sof tmax([ 1; 2; 3; : : : ; n])
(3)</p>
      <sec id="sec-3-1">
        <title>User-level Attention Mechanism. Knowing that the posts by a user do</title>
        <p>
          not contribute equally to detect her/his mental state [
          <xref ref-type="bibr" rid="ref32">32</xref>
          ], a user-level
attention mechanism is used to make the system learn to automatically detect the
contribution of each post to the nal classi cation of the user.
        </p>
        <p>The mechanism of the user-level attention is similar to the post-level
attention mechanism, but computes a vector representation of a user from the
representation of her/his posts (resulted from the post-level attention/pooling).
Output (Classi cation) Layer. The nal layer in the sub-models is a
feedforward fully-connected layer that maps the output of the user-level attention to
a vector with size 2 (corresponding to the negative and positive classes). At the
end of this layer, a softmax activation function gives as the output, the predicted
probability distribution over the classes negative and positive.
4.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Ensemble Model</title>
        <p>As shown in Fig. 1, the ensemble model is composed of several neural sub-models,
a fusion component, and a nal SVM classi er. The fusion component
concatenates the outputs of the user-level attention units (which will subsequently be
referred to as neural features), and the predicted probability distributions of
the two classes, resulting from the softmax activation functions from all its
constituent sub-models. The output of the fusion component is taken as the nal
representation of a user. This representation is nally fed to an SVM classi er
to perform the ensemble classi cation.
5</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experimental Setup</title>
      <p>
        This section describes our experiments with the above model for our
participation to the eRisk 2019 shared task [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ].
5.1
      </p>
      <sec id="sec-4-1">
        <title>Sub-models Implementation</title>
        <p>
          PyTorch [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ] was used to implement and train the sub-models. The Adam
optimizer [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] was used, and the learning rate was set to 5 10 4. Cross-entropy
was used as the loss function, in order to handle the imbalanced distribution
of the positive and negative classes in the training set (see Table 1), weights
proportional to the inverse of the number of samples of each class were assigned
to that class. Due to lack of computational resources, mini-batches with a
maximum size of 128 were used at the post level for each user and only the rst
100 tokens of the posts were used4. In order to minimize the amount of padding
in the batches, posts with similar number of tokens were assigned to the same
batch.
        </p>
        <p>In order to ne-tune the other hyperparameters of the sub-models (including
the number and size of convolutional lters, number of recurrent units, and
number of training epochs), each sub-model was individually trained with training
set and optimized on the validation set (see Table 1), based on F1 score. The
speci cs of the 8 di erent sub-models are shown in Table 2. Since each sub-model
is composed of a unique pair of hidden layer and word embedding type, they will
later be referred to as &lt;hidden-type&gt;-&lt;embedding-type&gt; (see the second column
of Table 2).
# Name Hyperparameters
1 CNN-GloVe 100 bigram convolution lters, trained for 10 epochs
2 CNN-ELMo 200 unigram lters and 50 bigram convolution lters, trained for 6 epochs
3 BiRNN-GloVe one layer of 64 vanilla RNN units, trained for 14 epochs
4 BiRNN-ELMo one layer of 50 vanilla RNN units, trained for 13 epochs
5 BiLSTM-GloVe one layer of 32 bidirectional LSTM units, trained for 31 epochs
6 BiLSTM-ELMo one layer of 64 bidirectional LSTM units, trained for 14 epochs
7 BiGRU-GloVe one layer of 64 bidirectional GRUs, trained for 14 epochs
8 BiGRU-ELMo one layer of 64 bidirectional GRUs, trained for 8 epochs
5.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Ensemble Classi ers</title>
        <p>
          Scikit-learn [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ] was used to develop the SVM classi er used in the ensemble
model. Three di erent versions of ensemble classi ers were developed:
1. Ens-Feat is the version of the ensemble model that only utilizes the neural
features. The SVM classi er in this version uses a sigmoid kernel. The and
C parameters in the SVM were set to auto (i.e. 1/&lt;number-of-features&gt;)
and 4, respectively.
2. Ens-Prob uses only the predicted class probabilities from the softmax
activation function at the end of the neural sub-models. It utilizes a polynomial
kernel with the degree of 1. The and C parameters in the SVM were
set to scale (i.e. 1/[&lt;number-of-features&gt; &lt;variance-of-features&gt;]) and 1,
respectively.
3. Ens-All utilizes both neural features and predicted class probabilities in its
SVM classi er, that uses a sigmoid kernel, and has its values of and C set
to auto and 2, respectively.
4 This limit only truncated a small number of posts, as the average length was 37.47
tokens in the eRisk 2018 task 2 data.
Based on the results with the validation set, 5 runs were submitted to the shared
task server. For the 1st and 2nd runs, CNN-GloVe and CNN-ELMo were used,
respectively, as stand-alone models5, and Ens-Feat, Ens-Prob, and Ens-All
comprised the 3rd, 4th and 5th runs.
6
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Results and Discussion</title>
      <p>As shown in Table 3, the model Ens-All achieved the highest F1 (0.7073) and
latency-weighted F1 (0.6908) scores of all participants' runs. This was in line with
our intuition that using an ensemble model that makes use of both neural features
and predicted class probabilities from the 8 sub-models has a higher capability of
detecting the correct class after observing a small number of writings. The results
also show that the CNN-ELMo model can achieve F1 and latency-weighted F1
scores that are competitive to Ens-All, and outperforms Ens-Feat and
EnsProb in these two metrics. The CNN-ELMo model also resulted in the best
recall, ERDE5 and ERDE50, showing the potential of this model to be used
independently for the task of early risk detection of anorexia.</p>
      <p>Table 3, also shows that all our models, except CNN-GloVe (run 0) achieved
signi cantly superior performances in terms of F1 score and latency-weighted
5 These two sub-models achieved the most promising results among all the sub-models,
during the training phase.</p>
      <p>F1 (teams lirmm and INAOE-CIMAT achieved the next best F1 and
latencyweighted F1 scores). Run 1 of team lirmm achieved the highest precision. The
best recall was achieved by run 2 of the team Fazl. Runs 0 and 4 of the team
UNSL achieved the highest ERDE5 and ERDE50, respectively, where we could
achieve competitive results using CNN-ELMo.</p>
      <p>The number of writings processed by the models submitted by each team
shows that our models used a signi cantly lower number of writings in
comparison to the other teams6. This shows that our systems have a great potential
of making early and correct decisions. This is supported by an even larger gap
between the latency-weighted F1 scores of our team and the runs submitted by
other teams, in comparison to the gap in F1 score.</p>
      <p>Although our systems achieved the best or competitive results according to
di erent evaluation metrics, we su ered from lack of computational resources
when running the models that use the ELMo embedder for around 2000
iterations. The models had to be run for approximately 2000 times due to the
item-by-item release of the test data which was chosen for the eRisk 2019 shared
task (in the previous eRisk shared tasks, the test data was released in 10 chunks,
making the number of iterations equal to 10). Despite this technical drawback,
the advantages of using ELMo to extract context-sensitive embeddings greatly
outweigh its disadvantages. This can also be observed by comparing the results
achieved by CNN-GloVe and CNN-ELMo.
7</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusion and Future Work</title>
      <p>
        This paper presents an ensemble approach which can be used to detect distress
in the social media posts of a user. The ensemble model utilizes neural features
alongside predicted class probabilities which are output by 8 di erent neural
sub-models. Using this model and under the team name CLaC, we participated
to the rst task of eRisk 2019 [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], which was aimed at the early detection of
anorexia in online posts, and ranked rst in terms of F1 and latency-weighted
F1 scores.
      </p>
      <p>
        Using a similar architecture, we also participated to the CLPsych 2019 shared
task [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ], whose aim was to assess suicide risk based on online posts. Considering
that our ensemble model ranked rst in tasks A and C of this shared task, the
same model architecture seems applicable to other similar tasks, where the goal
is to detect di erent types of mental health issues using social media posts.
      </p>
      <p>We believe that the user-level attention mechanism has played an important
role in the good results achieved on these shared tasks. It would be interesting
to qualitatively analyze the results of the attention mechanism, to see how they
correlate with human perception, i.e. whether the posts to which the attention
mechanism assigns more weights are actually the same posts that seem more
informative to a health specialist for detecting anorexia.</p>
      <p>Also, during the development phase, it was found that removing each of the 8
sub-models (evens the sub-models with low individual performances) negatively
6 The average number of writings processed by the participating teams was 1273.
a ected the result of the nal ensemble classi er. It would be interesting to
measure quantitatively the contribution of each of the 8 neural sub-models in
the result of the nal classi er. This could then be leveraged to improve the
performance of the system.</p>
      <p>
        An additional research direction is the use of linguistic features and metadata.
The current model does not explicitly use such features, however Trotzek et al.
[
        <xref ref-type="bibr" rid="ref32">32</xref>
        ] showed that they can signi cantly improve early detection of anorexia.
      </p>
      <p>
        Lastly, it would be interesting to experiment with more diverse architectures
in the neural sub-models (e.g. by using other hidden layer architectures, such as
recursive neural networks [
        <xref ref-type="bibr" rid="ref11 ref29">11,29</xref>
        ]) as a way of improving the performance of the
current ensemble classi er.
      </p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgment</title>
      <p>We would like to thank the reviewers for their comments on an earlier version
of this paper.</p>
      <p>This work was nancially supported by the Natural Sciences and Engineering
Research Council of Canada (NSERC).
national Joint Conference on Arti cial Intelligence (IJCAI 2015). Buenos Aires,
Argentina (July 2015)
35. Zirikly, A., Resnik, P., Uzuner, O., Hollingshead, K.: CLPsych 2019 shared task:
Predicting the degree of suicide risk in Reddit posts. In: Proceedings of the Sixth
Workshop on Computational Linguistics and Clinical Psychology: From Keyboard
to Clinic (CLPsych 2019). Minneapolis, Minnesota, USA (June 2019)</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Ball</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Patel</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hayes</surname>
            ,
            <given-names>R.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dobson</surname>
            ,
            <given-names>R.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stewart</surname>
          </string-name>
          , R.:
          <article-title>TextHunter { a user friendly tool for extracting generic concepts from free text in clinical research</article-title>
          .
          <source>In: AMIA Annual Symposium Proceedings</source>
          . vol.
          <year>2014</year>
          , p.
          <fpage>729</fpage>
          . American Medical Informatics Association (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Benton</surname>
            , A., Mitchell,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hovy</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Multitask learning for mental health conditions with limited social media data</article-title>
          .
          <source>In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics (EACL</source>
          <year>2017</year>
          ). pp.
          <volume>152</volume>
          {
          <fpage>162</fpage>
          . Association for Computational Linguistics, Valencia,
          <source>Spain (April</source>
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Calvo</surname>
            ,
            <given-names>R.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Milne</surname>
            ,
            <given-names>D.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hussain</surname>
            ,
            <given-names>M.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Christensen</surname>
          </string-name>
          , H.:
          <article-title>Natural language processing in mental health applications using non-clinical texts</article-title>
          .
          <source>Natural Language Engineering</source>
          <volume>23</volume>
          (
          <issue>5</issue>
          ),
          <volume>649</volume>
          {
          <fpage>685</fpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Chelba</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schuster</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ge</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brants</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koehn</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Robinson</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>One billion word benchmark for measuring progress in statistical language modeling</article-title>
          .
          <source>In: 15th Annual Conference of the International Speech Communication Association (INTERSPEECH</source>
          <year>2014</year>
          ).
          <source>Singapore (September</source>
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Cho</surname>
          </string-name>
          , K.,
          <string-name>
            <surname>van Merrienboer</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gulcehre</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bahdanau</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bougares</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schwenk</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Learning phrase representations using RNN encoder{decoder for statistical machine translation</article-title>
          .
          <source>In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP</source>
          <year>2014</year>
          ). pp.
          <volume>1724</volume>
          {
          <fpage>1734</fpage>
          .
          <string-name>
            <surname>Doha</surname>
          </string-name>
          , Qatar (
          <year>October 2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Coppersmith</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dredze</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Harman</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Quantifying mental health signals in twitter</article-title>
          .
          <source>In: Proceedings of the Workshop on Computational Linguistics</source>
          and
          <article-title>Clinical Psychology: From Linguistic Signal to Clinical Reality (CLPsych</article-title>
          <year>2014</year>
          ). pp.
          <volume>51</volume>
          {
          <fpage>60</fpage>
          .
          <string-name>
            <surname>Baltimore</surname>
          </string-name>
          , Maryland, USA (
          <year>June 2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Coppersmith</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dredze</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Harman</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hollingshead</surname>
            , K., Mitchell,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>CLPsych 2015 shared task: Depression and PTSD on twitter</article-title>
          .
          <source>In: Proceedings of the 2nd Workshop on Computational Linguistics</source>
          and
          <article-title>Clinical Psychology: From Linguistic Signal to Clinical Reality (CLPsych</article-title>
          <year>2015</year>
          ). pp.
          <volume>31</volume>
          {
          <fpage>39</fpage>
          . Association for Computational Linguistics, Denver, Colorado (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Davidson</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Warmsley</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Macy</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weber</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Automated hate speech detection and the problem of o ensive language</article-title>
          .
          <source>In: Proceedings of the Eleventh International Conference on Web and Social Media</source>
          . pp.
          <volume>512</volume>
          {
          <fpage>515</fpage>
          . Montreal, Canada (May
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>DeVault</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Georgila</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Artstein</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Morbini</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Traum</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Scherer</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Morency</surname>
            ,
            <given-names>L.P.</given-names>
          </string-name>
          , et al.:
          <article-title>Verbal indicators of psychological distress in interactive dialogue with a virtual human</article-title>
          .
          <source>In: Proceedings of the Special Interest Group on Discourse and Dialogue Conference (SIGDIAL</source>
          <year>2013</year>
          ). pp.
          <volume>193</volume>
          {
          <issue>202</issue>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Giatsoglou</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vozalis</surname>
            ,
            <given-names>M.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Diamantaras</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vakali</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sarigiannidis</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chatzisavvas</surname>
            ,
            <given-names>K.C.</given-names>
          </string-name>
          :
          <article-title>Sentiment analysis leveraging emotions and word embeddings</article-title>
          .
          <source>Expert Systems with Applications</source>
          <volume>69</volume>
          , 214{
          <fpage>224</fpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Goller</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuchler</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Learning task-dependent distributed representations by backpropagation through structure</article-title>
          .
          <source>Neural Networks</source>
          <volume>1</volume>
          ,
          <issue>347</issue>
          {
          <fpage>352</fpage>
          (
          <year>1996</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Hochreiter</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schmidhuber</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>Long short-term memory</article-title>
          .
          <source>Neural Computation</source>
          <volume>9</volume>
          (
          <issue>8</issue>
          ),
          <volume>1735</volume>
          {
          <fpage>1780</fpage>
          (
          <year>1997</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Jackson</surname>
            ,
            <given-names>R.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Patel</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jayatilleke</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kolliakou</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ball</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gorrell</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Roberts</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dobson</surname>
            ,
            <given-names>R.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stewart</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          :
          <article-title>Natural language processing to extract symptoms of severe mental illness from clinical text: the clinical record interactive search comprehensive data extraction (CRIS-CODE) project</article-title>
          .
          <source>British Medical Journal (BMJ open)</source>
          <volume>7</volume>
          (
          <issue>1</issue>
          ) (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Kingma</surname>
            ,
            <given-names>D.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ba</surname>
          </string-name>
          , J.:
          <article-title>Adam: A method for stochastic optimization</article-title>
          .
          <source>In: The 3rd International Conference for Learning Representations (ICLR</source>
          <year>2015</year>
          ). San Diego, California, USA (May
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>LeCun</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ha</surname>
            <given-names>ner</given-names>
          </string-name>
          , P.,
          <string-name>
            <surname>Bottou</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Object recognition with gradientbased learning</article-title>
          .
          <source>In: Shape, Contour and Grouping in Computer Vision</source>
          , pp.
          <volume>319</volume>
          {
          <issue>345</issue>
          (
          <year>1999</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Losada</surname>
            ,
            <given-names>D.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Crestani</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>A test collection for research on depression and language use</article-title>
          .
          <source>In: International Conference of the Cross-Language Evaluation Forum for European Languages</source>
          . pp.
          <volume>28</volume>
          {
          <fpage>39</fpage>
          .
          <string-name>
            <surname>Evora</surname>
          </string-name>
          ,
          <string-name>
            <surname>Portugal</surname>
          </string-name>
          (
          <year>September 2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Losada</surname>
            ,
            <given-names>D.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Crestani</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parapar</surname>
            ,
            <given-names>J.: eRISK</given-names>
          </string-name>
          <year>2017</year>
          :
          <article-title>CLEF lab on early risk prediction on the internet: Experimental foundations</article-title>
          .
          <source>In: Proceedings of the International Conference of the Cross-Language Evaluation Forum for European Languages</source>
          . pp.
          <volume>346</volume>
          {
          <fpage>360</fpage>
          . Dublin, Ireland (
          <year>September 2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Losada</surname>
            ,
            <given-names>D.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Crestani</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parapar</surname>
          </string-name>
          , J.:
          <article-title>Overview of eRisk: Early Risk Prediction on the Internet</article-title>
          .
          <source>In: CLEF</source>
          <year>2018</year>
          :
          <article-title>Experimental IR Meets Multilinguality</article-title>
          , Multimodality, and Interaction. pp.
          <volume>343</volume>
          {
          <fpage>361</fpage>
          .
          <string-name>
            <surname>Avignon</surname>
          </string-name>
          , France (
          <year>September 2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Losada</surname>
            ,
            <given-names>D.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Crestani</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parapar</surname>
          </string-name>
          , J.:
          <source>Overview of eRisk</source>
          <year>2019</year>
          :
          <article-title>Early Risk Prediction on the Internet</article-title>
          .
          <source>In: Experimental IR Meets Multilinguality, Multimodality, and Interaction. 10th International Conference of the CLEF Association</source>
          ,
          <string-name>
            <surname>CLEF</surname>
          </string-name>
          <year>2019</year>
          . Lugano,
          <string-name>
            <surname>Switzerland</surname>
          </string-name>
          (
          <year>September 2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Lynn</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goodman</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Niederho</surname>
            <given-names>er</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            ,
            <surname>Loveys</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            ,
            <surname>Resnik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Schwartz</surname>
          </string-name>
          ,
          <string-name>
            <surname>H.A.</surname>
          </string-name>
          :
          <article-title>CLPsych 2018 shared task: Predicting current and future psychological health from childhood essays</article-title>
          .
          <source>In: Proceedings of the Fifth Workshop on Computational Linguistics</source>
          and Clinical Psychology: From Keyboard to Clinic (CLPsych
          <year>2018</year>
          ). pp.
          <volume>37</volume>
          {
          <fpage>46</fpage>
          . Association for Computational Linguistics, New Orleans, LA (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Milne</surname>
            ,
            <given-names>D.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pink</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hachey</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Calvo</surname>
            ,
            <given-names>R.A.</given-names>
          </string-name>
          :
          <article-title>CLPsych 2016 shared task: Triaging content in online peer-support forums</article-title>
          .
          <source>In: Proceedings of the Third Workshop on Computational Linguistics and Clinical Psychology (CLPsych</source>
          <year>2016</year>
          ). pp.
          <volume>118</volume>
          {
          <fpage>127</fpage>
          . Association for Computational Linguistics, San Diego, CA, USA (
          <year>June 2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Mohammadi</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Amini</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kosseim</surname>
          </string-name>
          , L.: CLaC at CLPsych 2019:
          <article-title>Fusion of neural features and predicted class probabilities for suicide risk assessment based on online posts</article-title>
          .
          <source>In: Proceedings of the Sixth Workshop on Computational Linguistics</source>
          and Clinical Psychology: From Keyboard to Clinic (CLPsych
          <year>2019</year>
          ). Minneapolis, Minnesota, USA (
          <year>June 2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Ofoghi</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mann</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verspoor</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Towards early discovery of salient health threats: A social media emotion classi cation technique</article-title>
          .
          <source>In: Biocomputing 2016: Proceedings of the Paci c Symposium</source>
          . pp.
          <volume>504</volume>
          {
          <fpage>515</fpage>
          .
          <string-name>
            <surname>Kohala</surname>
            <given-names>Coast</given-names>
          </string-name>
          ,
          <source>Hawaii (January</source>
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Paszke</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gross</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chintala</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chanan</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>DeVito</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Desmaison</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Antiga</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lerer</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Automatic di erentiation in PyTorch</article-title>
          . In: NIPS 2017
          <string-name>
            <given-names>Autodi</given-names>
            <surname>Workshop</surname>
          </string-name>
          . Long Beach, California, USA (
          <year>January 2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Pedregosa</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Varoquaux</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gramfort</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Michel</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thirion</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grisel</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blondel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Prettenhofer</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weiss</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dubourg</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vanderplas</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Passos</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cournapeau</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brucher</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perrot</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Duchesnay</surname>
          </string-name>
          , E.:
          <article-title>Scikit-learn: Machine learning in Python</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          <volume>12</volume>
          (Oct),
          <volume>2825</volume>
          {
          <fpage>2830</fpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>Pennington</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Socher</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          : GloVe:
          <article-title>Global vectors for word representation</article-title>
          .
          <source>In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP</source>
          <year>2014</year>
          ). pp.
          <volume>1532</volume>
          {
          <fpage>1543</fpage>
          .
          <string-name>
            <surname>Doha</surname>
          </string-name>
          , Qatar (
          <year>October 2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <surname>Pestian</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nasrallah</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Matykiewicz</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bennett</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leenaars</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Suicide note classi cation using natural language processing: A content analysis</article-title>
          .
          <source>Biomedical informatics insights 3</source>
          , BII{S4706 (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          28.
          <string-name>
            <surname>Shen</surname>
            ,
            <given-names>J.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rudzicz</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Detecting anxiety through reddit</article-title>
          .
          <source>In: Proceedings of the Fourth Workshop on Computational Linguistics</source>
          and
          <article-title>Clinical Psychology { From Linguistic Signal to Clinical Reality (CLPsych</article-title>
          <year>2017</year>
          ). pp.
          <volume>58</volume>
          {
          <issue>65</issue>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          29.
          <string-name>
            <surname>Socher</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>C.C.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ng</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Parsing natural scenes and natural language with recursive neural networks</article-title>
          .
          <source>In: Proceedings of the 28th International Conference on Machine Learning (ICML</source>
          <year>2011</year>
          ). pp.
          <volume>129</volume>
          {
          <fpage>136</fpage>
          .
          <string-name>
            <surname>Bellevue</surname>
          </string-name>
          , Washington, USA (
          <year>June 2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          30.
          <string-name>
            <surname>Struik</surname>
            ,
            <given-names>L.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Baskerville</surname>
            ,
            <given-names>N.B.</given-names>
          </string-name>
          :
          <article-title>The role of facebook in crush the crave, a mobile-and social media-based smoking cessation intervention: qualitative framework analysis of posts</article-title>
          .
          <source>Journal of medical Internet Research</source>
          <volume>16</volume>
          (
          <issue>7</issue>
          ) (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          31.
          <string-name>
            <surname>Thompson</surname>
            ,
            <given-names>J.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leung</surname>
            ,
            <given-names>B.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blair</surname>
            ,
            <given-names>M.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Taboada</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Sentiment analysis of player chat messaging in the video game StarCraft 2: Extending a lexicon-based model</article-title>
          .
          <source>Knowledge-Based Systems 137, 149{162 (December</source>
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          32.
          <string-name>
            <surname>Trotzek</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koitka</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Friedrich</surname>
            ,
            <given-names>C.M.:</given-names>
          </string-name>
          <article-title>Word embeddings and linguistic metadata at the CLEF 2018 tasks for early detection of depression and anorexia</article-title>
          . In: Working Notes of CLEF 2018 -
          <article-title>Conference and Labs of the Evaluation Forum</article-title>
          . Avignon, France (
          <year>September 2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          33.
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>C.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jiang</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          :
          <article-title>Detecting signals of adverse drug reactions from health consumer contributed content in social media</article-title>
          .
          <source>In: Proceedings of ACM SIGKDD Workshop on Health Informatics (HI-KDD</source>
          <year>2012</year>
          ). Beijing, China (
          <year>August 2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          34.
          <string-name>
            <surname>Yin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Karimi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lampert</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cameron</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Robinson</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Power</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          :
          <article-title>Using social media to enhance emergency situation awareness</article-title>
          . In:
          <string-name>
            <surname>Twenty-Fourth</surname>
          </string-name>
          Inter-
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>