<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>UArizona at the CLEF eRisk 2017 Pilot Task: Linear and Recurrent Models for Early Depression Detection</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Farig Sadeque</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dongfang Xu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Steven Bethard</string-name>
          <email>bethardg@email.arizona.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>School of Information, University of Arizona 1103 E 2nd St</institution>
          ,
          <addr-line>Tucson, AZ 85721</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The 2017 CLEF eRisk pilot task focuses on automatically detecting depression as early as possible from a users' posts to Reddit. In this paper we present the techniques employed for the University of Arizona team's participation in this early risk detection shared task. We leveraged external information beyond the small training set, including a preexisting depression lexicon and concepts from the Uni ed Medical Language System as features. For prediction, we used both sequential (recurrent neural network) and non-sequential (support vector machine) models. Our models perform decently on the test data, and the recurrent neural models perform better than the non-sequential support vector machines while using the same feature sets.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Depression is responsible for almost 12% of all Years Lived with Disabilities
(YLDs), with nearly 350 million people su ering from it worldwide [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. As of 2000,
depression also comes with an annual economic burden of 83 billion US dollars [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ];
and a 1990 study by Goodwin and Jamison [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] suggested that depression is also
the leading cause of suicide, as 15-20% of all major depressive disorder patients
take their lives. Early detection of depression can help mitigate these threats,
but most studies on early detection rely on diagnoses from patients' self reported
surveys and experiences[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], which are extremely costly in terms of both time and
money, and a major portion of countries with primary health care service do not
have the support for these diagnoses. Fortunately, social media may provide us
with a solution to this problem, as many studies have successfully leveraged the
contents of social media to analyze and predict users' mental well-being [
        <xref ref-type="bibr" rid="ref13 ref15 ref16 ref19 ref6">6, 13,
15, 16, 19</xref>
        ]. Unfortunately, none of these studies focuses on the importance of the
temporal aspect of these detection tasks; hence the introduction of the pilot task
on early risk detection of depression in the CLEF eRisk 2017 workshop on Early
risk prediction on the Internet: experimental foundations [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
      <p>
        In this paper we present the ve early risk detection models we submitted
for the pilot task. We tried to leverage external knowledge sources beyond the
provided training data. We incorporated the depression lexicon created by [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and
used Metamap [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] to obtain Uni ed Medical Language System (UMLS)-identi ed
concept unique identi ers related to mental and behavioral dysfunction from user
texts. We used these features with both sequential and non-sequential learning
models: we used support vector machines as the non-sequential linear model and
recurrent neural networks with multiple layers of gated recurrent units as the
sequential models. Our results demonstrate the superiority of sequential over
non-sequential models on this task.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Task Description</title>
      <p>
        The pilot task on early risk detection of depression focused on sequential
processing of contents posted by users in Reddit1 [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. The data was a collection of
writings posted by users, divided into two cohorts: depressed and non-depressed.
The text collection for each user is sorted in a chronological order and then
divided into 10 chunks, where the rst 10% of a user's writings is in chunk 1, the
second 10% is in chunk 2 and so on. Details of this data are in section 3.
      </p>
      <p>The task was divided into two stages: training stage and testing stage. The
training stage started on November 30, 2016, when the entire text collection of 486
users was released, with their user-level annotations of depression. Participants
then had a little over two months to develop their systems. The testing stage
started on February 6, 2017 when the rst 10% of texts written by 401 previously
unobserved users were released. For the next 9 weeks, new chunks were released,
with each chunk including the next 10% of each user's text. After each release,
and before the next release, systems had to make a three-way decision for each
user: tag the user as depressed, tag the user as non-depressed, or wait to see the
next chunk of data. If a user was tagged as either depressed or non-depressed,
this decision was nal and could not be changed for future chunks of data. After
the release of the 10th (and last) chunk of the data, the decision was two-way:
tag the user as depressed, or tag the user as non-depressed. The prediction model
was then evaluated based on its correctness and how early in the series of chunks
was able to make its predictions.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Data</title>
      <p>A number of social media websites were considered as potential data sources for
this shared task. Twitter2 was discarded because it provided little to no context
about the user, is highly dynamic and did not allow them to collect more than
3200 tweets per user, which, in 140-character microblogs, represents only a small
amount of text. MTV's A Thin Red Line (ATL)3, a platform designed to empower
distressed teens to respond to issues ranging from sexting to cyberbullying, was
also considered, but discarded as there were concerns about redistribution and
1 http://www.reddit.com
2 http://www.twitter.com
3 http://www.athinline.org
problems regarding obtaining user history. Eventually, Reddit, a social media and
news aggregation website, was selected because of its organization of contents
among speci c subreddits, and the ease of collecting data using the API provided
by Reddit itself.</p>
      <p>For each user, the organizers collected the maximum number of submissions
they could nd and were allowed to download through the API (maximum 2000
posts and comments per user). Users with less than 10 submissions were discarded.
Original redditor IDs were replaced with pseudo user IDs for anonymization, and
published along with the title, time and text of the posts.</p>
      <p>
        After the data collection, the users were divided into two cohorts: an
experimental depressed group and a control (non-depressed) group. For the depressed
group, the organizers searched for phrases associated with self-declaration of
depression, such as diagnosed with depression, and then manually examined the
posts to lter down to just those redditors who explicitly said they were diagnosed
with depression by a physician. These self declaration posts were omitted from
the dataset to avoid making the detection trivial. For the non-depressed group,
organizers collected redditors who had participated in depression forums but had
no declaration of depression, as well as redditors from other random subreddits.
Their nal collection contained 531,453 submissions from 892 unique users, of
which 486 users were used as training data, and 401 were used as test data.
Statistics for that dataset are shown in Table 1.
This is a set of unigrams that has high probability of appearing in
depressionrelated posts. The list was collected from [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], where the authors compiled a list of
words that are most associated with the stem \depress" in the Yahoo! Answers
Mental Health forum using pointwise mutual information and log-likelihood ratio
and kept the top words based on their TF-IDF in Wikipedia articles. We used
the top 110 words that were presented in the paper. For each post, we generated
110 features: the counts of how many times each word occurred in the post.
4.2
      </p>
      <sec id="sec-3-1">
        <title>Metamap Features</title>
        <p>
          We used Metamap [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], which is a highly con gurable tool to discover concepts
from the Uni ed Medical Language System (UMLS) Metathesaurus4. In our
preliminary experiments we found out that Metamap produced a lot of incorrect
concept matches in social media texts (as it was mainly built to run on clinical
texts), but with some tuning, it was possible to use this e ectively on social
media. We restricted Metamap to only one source (SNOMEDCT-US) and to
only two semantic types (Mental or Behavioral Dysfunction, Clinical Drugs). We
passed each post through the restricted Metamap and collected all the predicted
concept unique identi ers (CUIs). We ended up with a set of 404 CUIs. We
generated 404 features for each post: the counts of how many times each CUI
occurred in the post.
5
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Classi ers</title>
      <p>We considered both sequential and non-sequential learning models for the
prediction task. For the non-sequential model, we used a support vector machine
that observes the user's entire post history at once. For the sequential model, we
used a recurrent neural network model that observes each of a user's posts, one
at a time. We also used an ensemble model to combine both the sequential and
non-sequential models.</p>
      <p>As per the shared task de nition, classi ers were given the user's history
in chunks (the rst 10% of the user history, then the rst 20%, etc.) and after
each chunk, the classi ers were asked to make a prediction of \depressed", \not
depressed", or \wait". All our classi ers were trained to make two-way predictions,
\depressed" vs. \wait", and if a classi er predicted a user as depressed after seeing
the rst n% of the history, that prediction was considered nal and the remaining
100 n% of the history was ignored. On the nal (10th) chunk, all users who had
only ever been predicted as \wait" were classi ed as \not depressed". Note that
our models never made post-by-post decisions; they always observed the entirety
of the n% of the history they were given and then made a single prediction for
the entire n%.
5.1</p>
      <sec id="sec-4-1">
        <title>Support Vector Machine</title>
        <p>
          A support vector machine [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] or SVM is a machine learning technique that is used
for binary classi cation tasks. In this technique, input vectors are non-linearly
mapped to a high dimensional feature space, where a linear decision surface is
constructed to classify the inputs. It is one of the most popular non-sequential
machine learning techniques because of its high generalization ability.
4 https://www.nlm.nih.gov/research/umls/knowledge_sources/metathesaurus/
        </p>
        <p>For the support vector machine (SVM) models we used, the feature vectors
needed to summarize the entire history of the user. We converted the post-level
raw count features to user-level proportion features (e.g., converting the number
of times depression was used in each post to the proportion of all words in a all
of a user's posts that were depression).</p>
        <p>
          We used two out-of-the-box implementations of support vector machines:
{ Weka's implementation of the sequential minimal optimization algorithm [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]
for training support vector machine classi ers [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ]. The model was set to
output probability estimates and it normalizes all attributes by default. Other
parameters were set to their defaults. We used a degree-1 polynomial kernel
and a cache size of 250007 as it performed better in preliminary experiments
on the training data.
{ LibSVM's implementation of support vector machines [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] using C-support
vector classi cation [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. Apart from tuning the model for probability estimate
outputs, we used the default parameter settings. We used the radial basis
function kernel for this one as it performed better in preliminary experiments
on the training data.
        </p>
        <p>Sigmoid</p>
        <p>GRU
GRU</p>
        <p>Input
Feature Vectors
Depressed or Not-Depressed</p>
        <p>
          1
1 + e x
Due to the sequential property of the data, we opted for machine learning
techniques that take advantage of this. We used Recurrent Neural Networks
(RNN) which have been successful in other natural language modeling problems
[
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. Recurrent neural networks are a form of arti cial neural network where
neurons are connected to form a directed cycle, allowing the network to exhibit
temporal behavior, and thus be used as a sequential learning model.
        </p>
        <p>We trained recurrent neural networks that take a sequence of feature vectors,
each representing a single post, and predict whether the user is depressed or not.
An ensemble learning model takes the outputs of a set of other machine learning
algorithms and uses them as inputs for classi cation. They are typically used to
improve the performance of individual machine learning techniques by leveraging
the di erent strengths of multiple approaches. For this task, We implemented an
ensemble learning technique using the probability outputs of the nine individual
models (3 from Weka, 3 from LibSVM and 3 from RNN: models used as features
either the depression lexicon, Metamap outputs, or both). We used 5-fold cross
validation for each model to calculate the probability of each user being depressed
and then fed these probabilities to a Naive Bayes classi er, which served as the
ensemble classi er. We used Weka's naive Bayes implementation with the default
parameter settings.
6</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Evaluation and Analysis</title>
      <p>We submitted ve di erent models for the task:
{ UArizonaA: An SVM model trained using LibSVM with the depression
lexicon and Metamap outputs as features.
{ UArizonaB: An SVM model trained using Weka with the depression lexicon
as features.
{ UArizonaC: An RNN model with both the depression lexicon and Metamap
outputs as features.
{ UArizonaD: The ensemble model.
{ UArizonaE: An RNN model with the same structure as UArizonaC, but that
always predicts \wait" until 60% of the test data is released.</p>
      <p>All of these models were selected for their individual properties. UArizonaA was
our most restrictive model, as it vigorously tried to not tag someone depressed,
whereas UArizonaC was the most open as it tagged more users as depressed than
any other models. The other 3 sat in between these 2. To make UArizonaA a
Model Brief description E5 E50 F1 P R
FHDO-BCSGA Ranked #1 for F1, #2 for E5, #2 for E50 12.82 9.69 0.64 0.61 0.67
UArizonaA LibSVM + lexicon + UMLS 14.62 12.68 0.40 0.31 0.58
UArizonaB WekaSVM + lexicon 13.07 11.63 0.30 0.33 0.27
UArizonaC RNN + lexicon + UMLS 17.93 12.74 0.34 0.21 0.92
UArizonaD Ensemble 14.73 10.23 0.45 0.32 0.79
UArizonaE RNN + lexicon + UMLS + 60%-wait 14.93 12.01 0.45 0.34 0.63
Table 2. Performance of the models. E5 and E50 are the shared-task-de ned Early
Risk Detection Error (ERDE) percentages, P is precision, R is recall, and F1 is the
harmonic mean of precision and recall.
little more open towards depression tagging, we combined its 10th chunk output
with UArizonaE's 10th chunk output.</p>
      <p>
        The performance of our models are given in table 2, along with the performance
of the model that achieved the highest F1 in the shared task, FHDO-BCSGA.
The models were evaluated based on 5 performance measures: precision, recall
and F1, and 2 Early Risk Detection Error (ERDE) [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] variants, with cuto
parameter o set to 5 and 50 posts. ERDE penalizes systems that take many posts
to predict depression. For precision, recall, and F1, a high score is good, while
for ERDE, a low score is good.
      </p>
      <p>Our models were competitive with others in the shared task. UArizonaC
ranked 1st out of 30 for recall, UArizonaD ranked 3rd for ERDE50, and UArizonaB
ranked 4th for ERDE5. For precision and F1, our models were less impressive;
both UArizonaD and UArizonaE ranked 11th for F1 and UArizonaE ranked 14th
for precision. Overall, UArizonaD is the best of our models: it has the highest
F1, the lowest ERDE50, and the second-best recall.
7</p>
    </sec>
    <sec id="sec-6">
      <title>Limitations</title>
      <p>Our models fell short of the best system in the task for two main reasons. First,
we attempted to predict depressed users from the beginning, even though the
number of posts varies dramatically from user to user (from only 1 post per
chunk to over 200 per chunk). A better strategy would have been to start making
predictions after observing some threshold n posts, allowing us to predict early
for users with many posts, while waiting until we have more information for users
with few posts. Second, we did not su ciently explore the broad range of possible
features. For example, we could have built a domain-speci c depression lexicon
and used it instead of a previously collected lexicon, or we could have used more
sophisticated techniques to represent posts as post-level feature vectors.
8</p>
    </sec>
    <sec id="sec-7">
      <title>Conclusion and Future Works</title>
      <p>In this paper, we described the techniques for the University of Arizona
submissions to the 2017 CLEF eRisk early risk detection of depression pilot task. We
used features based on a depression lexicon and on the Uni ed Medical Language
System (UMLS). We implemented sequential and non-sequential models and
used ensemble methods to leverage the best of each model. We found that the
ensemble model works better than the individual models, and waiting for more
data before making a decision improves the traditional performance measures
like precision, recall and F1. Whether it is acceptable to wait a decent amount of
time to have better performance is still an open question { and we would like
to work on that. We would like to establish a timeframe that can be deemed
acceptable before making a decision so that the tradeo between correctness and
speed of the decision is minimized.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgements</title>
      <p>We would like to express our gratitude to the organizers of the pilot task for their
e ort towards building the rst annotated user-level depression dataset.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Aronson</surname>
            ,
            <given-names>A.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lang</surname>
            ,
            <given-names>F.M.:</given-names>
          </string-name>
          <article-title>An overview of metamap: historical perspective and recent advances</article-title>
          .
          <source>In: Journal of the American Medical Informatics Association</source>
          <volume>17</volume>
          (
          <issue>3</issue>
          ). pp.
          <volume>229</volume>
          {
          <issue>236</issue>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Boser</surname>
            ,
            <given-names>B.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guyon</surname>
            ,
            <given-names>I.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vapnik</surname>
            ,
            <given-names>V.N.:</given-names>
          </string-name>
          <article-title>A training algorithm for optimal margin classi ers</article-title>
          .
          <source>In: Proceedings of the fth annual workshop on Computational learning theory</source>
          . pp.
          <volume>144</volume>
          {
          <fpage>152</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>1992</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <issue>3</issue>
          .
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>C.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>C.J.:</given-names>
          </string-name>
          <article-title>Libsvm: A library for support vector machines</article-title>
          .
          <source>ACM Transactions on Intelligent Systems and Technology (TIST) 2</source>
          (
          <issue>3</issue>
          ),
          <volume>27</volume>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Cho</surname>
          </string-name>
          , K.,
          <string-name>
            <surname>van Merrienboer</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bahdanau</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>On the properties of neural machine translation: Encoder-decoder approaches</article-title>
          . In: Eighth Workshop on Syntax,
          <source>Semantics and Structure in Statistical Translation (SSST-8)</source>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Cortes</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vapnik</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Support-vector networks</article-title>
          .
          <source>Machine learning 20(3)</source>
          ,
          <volume>273</volume>
          {
          <fpage>297</fpage>
          (
          <year>1995</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>De Choudhury</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gamon</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Counts</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Horvitz</surname>
          </string-name>
          , E.:
          <article-title>Predicting depression via social media</article-title>
          .
          <source>In: ICWSM</source>
          . p.
          <volume>2</volume>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Goodwin</surname>
            ,
            <given-names>F.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jamison</surname>
            ,
            <given-names>K.R.</given-names>
          </string-name>
          :
          <string-name>
            <surname>Manic-Depressive</surname>
            <given-names>Illness</given-names>
          </string-name>
          : Bipolar Disorder and
          <string-name>
            <given-names>Recurring</given-names>
            <surname>Depression</surname>
          </string-name>
          . Oxford University Press Inc., New York (
          <year>1990</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Greenberg</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kessler</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Birnbaum</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leong</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lowe</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Berglund</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , CoreyLisle, P.:
          <article-title>The economic burden of depression in the united states: how did it change between 1990 and 2000?</article-title>
          <source>In: J Clin Psychiatry</source>
          <volume>64</volume>
          (
          <issue>12</issue>
          ). pp.
          <volume>1465</volume>
          {
          <issue>1475</issue>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Hal n</surname>
          </string-name>
          , A.:
          <article-title>Depression: the bene ts of early and appropriate treatment</article-title>
          .
          <source>The American journal of managed care 13</source>
          (
          <issue>4 Suppl)</issue>
          ,
          <source>S927 (November</source>
          <year>2007</year>
          ), http: //europepmc.org/abstract/MED/18041868
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Losada</surname>
            ,
            <given-names>D.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Crestani</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>A test collection for research on depression and language use</article-title>
          .
          <source>In: International Conference of the Cross-Language Evaluation Forum for European Languages</source>
          . pp.
          <volume>28</volume>
          {
          <fpage>39</fpage>
          . Springer International Publishing (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Losada</surname>
            ,
            <given-names>D.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Crestani</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parapar</surname>
            ,
            <given-names>J.: eRISK</given-names>
          </string-name>
          <year>2017</year>
          :
          <article-title>CLEF Lab on Early Risk Prediction on the Internet: Experimental foundations</article-title>
          .
          <source>In: Proceedings Conference and Labs of the Evaluation Forum CLEF</source>
          <year>2017</year>
          . Dublin, Ireland (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          , Kara at, M.,
          <string-name>
            <surname>Burget</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cernocky</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khudanpur</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Recurrent neural network based language model</article-title>
          .
          <source>In: Interspeech</source>
          . vol.
          <volume>2</volume>
          , p.
          <volume>3</volume>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Moreno</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jelenchick</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Egan</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cox</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Young</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gannon</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Becker</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Feeling bad on facebook: depression disclosures by college students on a social networking site</article-title>
          .
          <source>In: Depression and Anxiety</source>
          <volume>28</volume>
          (
          <issue>6</issue>
          ). pp.
          <volume>447</volume>
          {
          <issue>455</issue>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Platt</surname>
          </string-name>
          , J.:
          <article-title>Fast training of support vector machines using sequential minimal optimization</article-title>
          . In: Schoelkopf,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Burges</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Smola</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . (eds.)
          <article-title>Advances in Kernel Methods - Support Vector Learning</article-title>
          . MIT Press (
          <year>1998</year>
          ), http://research.microsoft.com/ \~jplatt/smo.html
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Sadeque</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pedersen</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Solorio</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shrestha</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rey-Villamizar</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bethard</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Why do they leave: Modeling participation in online depression forums</article-title>
          .
          <source>In: Proceedings of the 4th Workshop on Natural Language Processing and Social Media</source>
          . pp.
          <volume>14</volume>
          {
          <issue>19</issue>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Schwartz</surname>
            ,
            <given-names>H.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sap</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kern</surname>
            ,
            <given-names>M.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eichstaedt</surname>
            ,
            <given-names>J.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kapelner</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Agrawal</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blanco</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dziurzynski</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Park</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stillwell</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , et al.:
          <article-title>Predicting individual wellbeing through the language of social media</article-title>
          .
          <source>In: Biocomputing 2016: Proceedings of the Paci c Symposium</source>
          . pp.
          <volume>516</volume>
          {
          <issue>527</issue>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Srivastava</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hinton</surname>
            ,
            <given-names>G.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Krizhevsky</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Salakhutdinov</surname>
          </string-name>
          , R.:
          <article-title>Dropout: a simple way to prevent neural networks from over tting</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          <volume>15</volume>
          (
          <issue>1</issue>
          ),
          <year>1929</year>
          {
          <year>1958</year>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Tieleman</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hinton</surname>
          </string-name>
          ,
          <source>G.: Lecture 6</source>
          .5
          <article-title>-rmsprop: Divide the gradient by a running average of its recent magnitude</article-title>
          .
          <source>COURSERA: Neural networks for machine learning 4(2)</source>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wan</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mellina</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>O'Hare</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Understanding and discovering deliberate self-harm content in social media</article-title>
          .
          <source>In: Proceedings of the 26th International Conference on World Wide Web</source>
          . pp.
          <volume>93</volume>
          {
          <fpage>102</fpage>
          .
          <string-name>
            <surname>International World Wide Web Conferences Steering Committee</surname>
          </string-name>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20. WHO:
          <article-title>The world health report 2001- mental health: New understanding, new hope</article-title>
          . http://www.who.int/whr/2001/en/whr01_en.
          <source>pdf?ua=1</source>
          (
          <issue>2001</issue>
          ), last Accessed:
          <fpage>2016</fpage>
          -04-02
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Witten</surname>
            ,
            <given-names>I.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Frank</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          :
          <article-title>Data mining: practical machine learning tools and techniques with java implementations (</article-title>
          <year>1999</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>