<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Detecting Early Risk of Depression from Social Media User-generated Content</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Hayda Almeida</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Antoine Briand</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marie-Jean Meurs</string-name>
          <email>meurs.marie-jean@uqam.ca</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Quebec in Montreal (UQAM)</institution>
          ,
          <addr-line>Montreal, QC</addr-line>
          ,
          <country country="CA">Canada</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper presents the systems developed by the UQAM team for the CLEF eRisk Pilot Task 2017. The goal was to predict as early as possible the risk of mental health issues from user-generated content in social media. Several approaches based on supervised learning and information retrieval methods were used to estimate the risk of depression for a user given the content of its posts in reddit. Among the five systems evaluated, the experiments show that combining information retrieval and machine learning approaches gives the best results.</p>
      </abstract>
      <kwd-group>
        <kwd>Information Retrieval</kwd>
        <kwd>Mental Health</kwd>
        <kwd>Natural Language Processing</kwd>
        <kwd>Supervised Learning</kwd>
        <kwd>Text Mining</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        The Early Detection of Depression Pilot Task was part of the CLEF eRisk 2017
workshop [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. The pilot task challenge consists of performing early risk detection of
depression by analyzing user-generated content from reddit1. Towards this goal, a system
receives user-generated content as input, and should output a prediction regarding the
user’s susceptibility to depression.
      </p>
      <p>The pilot task dataset contains user-generated content, which is organized and processed
chronologically. This allows for monitoring the user progress, and detecting risk as
early as possible. Users are categorized as risk or non-risk (of depression). Each user
produced a sequence of reddit posts, written within a given period of time.
The pilot task was organized in two stages: training and test, each having a different
dataset divided into 10 chunks. During training stage, a dataset containing a sequential
set of posts per user was provided along with the user’s category. All training chunks
were made available, containing the complete user post sequence.</p>
      <p>During test stage, the dataset of test users was released sequentially (one release each
week). Each release contained part of the user post sequence, corresponding to one
chunk (from the oldest to the newest posts). Participant systems had to output
predictions for users based on all current test chunks before the release of a new chunk. The
predictions could be either the category of a user or no decision, up to the last week of
the test stage where all the users had to be given a category.</p>
      <sec id="sec-1-1">
        <title>1 https://www.reddit.com/</title>
        <p>We describe hereafter our prediction system based on an ensemble classification
approach, which combines supervised learning, information retrieval, and feature
selection methods. This report is organized as follows: the system resources are described
in Section 3; the system modules, and the decision algorithm merging the module
predictions are described in Section 4. Experiments and results are described in Section 5
while conclusions and future works are discussed in Section 6.
2</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        Social media content has been commonly utilized to develop approaches that support
mental health care. The latest CLPsych Shared Tasks [
        <xref ref-type="bibr" rid="ref5">5,18</xref>
        ] have proposed participants
to predict users in eminent risk of depression or Post Traumatic Stress Disorder (PTSD).
These tasks made use of tweets or mental health forum posts.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], a sentiment analysis model was built with focus on user-generated social media
content. It uses highly relevant sentiment lexicons and sentiment intensity
measurements. The authors demonstrated that the approach outperforms other commonly used
lexicons, as well as machine learning-based tools.
      </p>
      <p>The authors of [19] evaluated the usage of different features to analyze user posts from
LiveJournal2, and compare discrepancies between posts from depression related
online communities, and control (non-depression) related communities.</p>
      <p>
        Another approach was proposed by [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], relying on a statistical model based on the
analysis of over 176 million tweets to identify communication patterns related to
mental illness in Twitter, and to attempt predicting user behavioral patterns related to
depression. We describe hereafter studies conducted mainly based on two research fields:
supervised learning, and information retrieval.
2.1
      </p>
      <sec id="sec-2-1">
        <title>Supervised Learning for Mental Health</title>
        <p>Several studies were conducted towards identifying mental health issues in social
media by using supervised learning methods. The choice of supervised algorithms varies
according to the tasks and data at hand. However, the previous studies presented here
generally rely on highly discriminative features to achieve state-of-the-art performance.
This demonstrates the importance of attribute choice for such tasks.</p>
        <p>
          In [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], the authors presented a study on predicting depression from tweets by
analyzing over 2 million posts of 476 users. The best performance was obtained with a SVM
classifier and a set of behavioral features, such as occurrence of pronouns, usage of
swearing and depression terms, tweet replies, as well as posting time and frequency.
The work presented in [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] identifies user psychological stress in tweets. Features such
as emotion words, smileys, tweet mentions, replies, and posting frequency were
obtained from single tweets, and from all user’s tweets. The best performance was
obtained by a four layer Deep Neural Network (DNN).
        </p>
        <p>
          Previous works have also used Twitter data to identify language differences between
users potentially presenting PTSD [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], or who attempted suicide [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. In both these
studies, the authors evaluated user-generated content using word and character language
        </p>
        <sec id="sec-2-1-1">
          <title>2 http://www.livejournal.com</title>
          <p>models. The findings point to characteristics of tweets associated to mental health
issues, such as heavier use of emotions, usage of third person pronouns, anxiety terms, as
well as high posting frequency.</p>
          <p>The authors in [23] analyzed Facebook 3 status updates to predict user satisfaction with
life. Their approach used feature selection of n-grams and topic extraction, aand built
regression models based on the message level, and the user level. The results indicate
that a cascade model, using message level predictions to inform user level predictions,
performed best.</p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>2.2 Information Retrieval for Mental Health</title>
        <p>
          Information retrieval techniques are widely used to support knowledge discovery in the
biomedical field. Most of the approaches are designed to help researchers and
practitioners looking for relevant documents to support experiments or diagnoses.
In the field of mental health, [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] reports an interesting study to support mental health
maintenance of U.S. army soldiers. The goal is to aid health practitioners to perform
efficient follow-ups on soldiers, since the suicide attempt rate among them is known to
be high.
        </p>
        <p>The approach made use of the Veterans Informatics and Computing Infrastructure (VINCI)
resource to process mostly unstructured health information, such as clinical notes. The
authors built a search engine based on Apache Solr4 indexing these textual data to
predict the risk of suicide attempt among soldiers. Even though only few pre-processing
steps were utilized in this system, it provides promising performance, and covers a
larger population than systems based on structured data.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Resources</title>
      <p>3.1</p>
      <sec id="sec-3-1">
        <title>Dictionaries</title>
        <p>The following Sections describe the resources utilized to build our systems.
The supervised learning-based systems rely on a set of depression-related dictionaries.
The dictionary keywords are used to provide discriminative attributes for automatic
classification. The dictionaries we utilized are lists of relevant feelings, medicine, drugs,
and diseases, which are assumed to be related to depression.</p>
        <p>
          The feeling dictionary is composed of feeling words used in mental status exams5, and
a conceptual feature map obtained from SenticNet [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. The medicine dictionary lists
antidepressant names or depression-related medicine, obtained from Wikipedia6. The
disease dictionary is composed of depression-related disease names, from Wikipedia7.
        </p>
        <sec id="sec-3-1-1">
          <title>3 https://www.facebook.com</title>
        </sec>
        <sec id="sec-3-1-2">
          <title>4 http://lucene.apache.org/solr/</title>
        </sec>
        <sec id="sec-3-1-3">
          <title>5 http://psychpage.com/learning/library/assess/feelings.html</title>
        </sec>
        <sec id="sec-3-1-4">
          <title>6 https://en.wikipedia.org/wiki/List_of_antidepressants</title>
        </sec>
        <sec id="sec-3-1-5">
          <title>7 https://en.wikipedia.org/wiki/Depression_(mood)</title>
          <p>The drug dictionary contains a list of psychoactive drug names, such as hallucinogens,
psychedelics, anxiolytics, and sedatives, also obtained from Wikipedia8.
3.2</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>Open Source Software</title>
        <p>Classification To support developing the supervised learning method in our system,
we have utilized the open-source machine learning framework Weka [24]9. The Weka
framework provides standard implementations of several classification algorithms. It
also provides modules to handle and process Attribute Relation Format Files (ARFF)
files, which contain a matrix representation of the dataset in terms of instances versus
features, allowing to easily perform feature selection.</p>
        <p>Indexating The information retrieval method in our system relies on the open-source
search platform Apache Solr. The Solr platform allows for building a search engine
to perform full-text search in a document index. Both Solr search and index modules
are built based on the Apache Lucene10 library. A Solr index is designed based on a
schema, which is composed of a set of fields that represent a document object. Several
pre-processing steps are also available in Solr, which can be applied at indexing time
and also at query time.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Methodology</title>
      <p>To detect users in risk of developing depression, we have designed a multipronged
approach that combines results obtained from both Information Retrieval (IR) and
Supervised Learning (SL) based systems. The combination is performed by a decision
algorithm.</p>
      <p>In Section 4.1, we explain how we utilized the CLEF eRisk training and test datasets in
our experiments. The IR-based systems are described in Section 4.2 while the SL-based
systems are presented in Section 4.3. Details on the decision algorithm are presented in
Section 4.4. Finally, we briefly describe how we performed experiments to determine
the best configuration for our approach in Section 4.5.
4.1</p>
      <sec id="sec-4-1">
        <title>Dataset</title>
        <p>The CLEF eRisk training and test datasets are composed of user posts extracted from
reddit. Both datasets are divided into a total of 10 chunks each, chronologically
organized. Each chunk represents a sequence of writings for a given user in a period of
time. Table 1 shows statistics on the eRisk 2017 pilot task datasets.</p>
        <p>We have utilized the chronological aspect of the user writings when processing both
training and test data. When processing the training data, we have computed the user
posting frequency, which is further described in Section 4.3. When processing the test
data, we have considered single chunk and multiple chunk predictions, as further
explained below.
8 https://en.wikipedia.org/wiki/Psychoactive_drug
9 http://www.cs.waikato.ac.nz/ml/weka/citing.html
10 https://lucene.apache.org/core/
# users
# writings
# no-risk users
# risk users
# no-risk writings
# risk writings</p>
        <p>Training dataset Test dataset
486
403
83
294,817</p>
        <p>236,371
263,966
30,851
217,665
18,706
401
349
52
Training The training set was provided in its completeness at the beginning of the
task. It has been manually annotated by experts. Users are categorized as either risk
(depressed) or non-risk (non-depressed).</p>
        <p>To identify the most suitable models for both IR and SL methods, we performed several
experiments using the training data. We utilized the training data in two different ways:
first, using cross-validation on the training chunks 1 to 10; second, using the training
chunks 1 to 9 as training set, and the training chunk 10 as validation set.
Test The test set was provided gradually, being each test chunk released one week apart
from the previous test chunk. Predictions on the test set were therefore provided weekly
by our systems.</p>
        <p>In order to output predictions in a given week, we have utilized the test data in two
different ways: first, to obtain a list of predictions only considering the current test
chunk; second, to obtain a list of predictions considering all test chunks released so
far. Both list of predictions are taken into account when merging outputs from different
models and systems.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2 Information Retrieval Based Systems</title>
        <p>We used an approach based on IR to retrieve similar documents from a test document
used as a query. The intuition is that using the full content of a user post as a query
should allow a search engine to retrieve semantically similar documents (posts). In our
context, the similar posts are retrieved from the training corpus where they are already
labeled according to the risk/no-risk state of the user who wrote them. We built two
search engines relying on two different indexes created from the eRisk training corpus
with and without indexing stop-words. We then considered the eRisk test documents as
queries, which were submitted to both search engines.</p>
        <p>For each test document d submitted to the search engines, we used the class (risk or
non-risk) of the top n retrieved documents to compute a score SIR(d) reflecting how
likely d has been produced by a depressed user. This can be compared to a k-nearest
neighbors approach since we want to get the closest documents (neighbours) to a given
document. The number of retrieved documents taken into account has been set
experimentally to n = 20. SIR(d) is computed as follows:</p>
        <p>SIR(d) =</p>
        <p>n
1 X
n
i=1
(di)
where di is the document retrieved by query d in position i, and
(di) =
(1; if di is labeled as risk</p>
        <p>
          0; otherwise
The test documents are then ordered according to their SIR score, and considered as
risk candidates if their score is above a given threshold, which was experimentally set.
The search engines created in this approach rely on Apache Solr, and the BM25
probabilistic ranking algorithm [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. We first indexed all the fields in the training set. Two
indexes, I and II, were generated based on the same schema but applying different
preprocessing steps, which are described in Tables 2 and 3.
        </p>
        <p>For Index I, we indexed all the data with little pre-processing. Index II uses the same
schema along with more pre-processing steps: stop-words removal, stemming (using
the Solr built-in Porter Stemmer algorithm), and punctuation filtering.</p>
        <p>Index name</p>
        <p>Pre-processing
Index I
Index II</p>
        <p>Tokenization
Lowercasing
As Index I +</p>
        <p>Stemming
Stopwords</p>
        <p>Punctuation
11 https://cwiki.apache.org/confluence/display/solr/Other+
Parsers#OtherParsers-MoreLikeThisQueryParser
#
1
2
3
4
5</p>
        <p>Indexed fields
Writing title
Writing content
Writing date
User label</p>
        <p>Text (fields 1 + 2)
The SL-based approach is based on the combined predictions of several classification
models with different configurations. The SL models are designed using four
classification algorithms and various feature types described below.</p>
        <p>Features To design models for the SL-based systems, we have extracted discriminative
features from the pilot task training dataset. Before extracting features, pre-processing
steps were performed. These include word stemming, and normalization of URLs,
smiley characters, as well as punctuation. The URLs and smiley normalization are relevant
to better process the user-generated content, and help portraying the sentiment
associated with a post. URLs can contain picture names, or words that refer to specific
subjects. Smiley symbols are often used to represent an emotion, and during pre-processing
they are replaced by actual words (e.g., :) or :-) are replaced by happy). All these
cues are important since, if present, they might help representing a user’s state of mind.</p>
        <p>After pre-processing, four different feature types were extracted: n-grams,
dictionary words, selected Part-Of-Speech (POS), and user posting frequency. N-gram
features were extracted as of Bag-Of-Words (BOW), bigrams, and trigrams. Dictionary
words were extracted based on the depression-related dictionaries described in
Section 3.1. POS features were extracted by selecting the words annotated by the Stanford
POS Tagger12 as either adjective (JJ), noun (NN), predeterminer (PDT), particle (RP),
or verb (VB).</p>
        <p>
          As an attempt to account for the temporal evolution of the psychological state of a
given user, we computed the user posting frequency, which represents the user activity
pattern. The posting frequency of a user is computed as the time lapse between the
oldest and the most recent writings, divided by the number of writings a user has generated
in total. Statistics on features extracted from the training set are presented in Table 4.
Classifiers To build the SL models we have used three classification algorithms:
Logistic Model Tree (LMT) [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ], an Ensemble of Sequential Minimal Optimization (SMO) [20]
(ens SMO) classifiers, and an Ensemble of Random Forests [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] (ens RF) classifiers.
12 https://nlp.stanford.edu/software/tagger.shtml
        </p>
        <p>BOW
Bigrams
Trigrams
Selected POS
Feelings dic.</p>
        <p>Medicine dic.</p>
        <p>Drugs dic.</p>
        <p>Diseases dic.</p>
        <p># Features</p>
        <p>105,161
1,544,714
3,397,459
118,139
205
30
57
43
The ensembles are composed of 30 different classifiers each.</p>
        <p>The 30 Random Forest classifiers composing the ens RF were designed with iteration
values from 10 to 50 (with increments of 10), and tree depth values from 2 to 10 (with
increments of 2), as well as unlimited.</p>
        <p>The 30 SMO classifiers composing the ens SMO were designed with tolerance
parameter values from 0.001 to 0.005 (with increments of 0.001), and epsilon for round-off
error values from 1 to 5 (with increments of 1).
The decision algorithm merges the predictions from both IR and SL based systems.
The IR-based candidates are ranked based on similarity, and each candidate is
associated with a SIR score, as described in Section 4.2. Documents with highest scores are
considered as candidates for the risk class. For the eRisk task, the high score threshold
has been experimentally set to 0:7, i.e. all the candidates are documents d with a score
SIR(d) such that SIR(d) 0:7.</p>
        <p>The SL-based approaches are used to refine the list of candidates proposed by the
IRbased systems. To be selected, a document from the IR-based list must be classified
as risk by at least one of the SL-based systems. Candidates proposed by the SL-based
system are also ordered according to the confidence of the prediction, and first ranked
candidates are selected regardless of their presence in the IR list. The decision function
can be formalized as follows:</p>
        <p>(d) = 1IR(d) + 1SL(d) + 1SLf (d)
where d is a test document, and 1IR, 1SL, 1SLf are the indicator function respectively
associated to the IR-based, the SL-based, and the SL-ranked-first lists of candidates. If
(d) 2, the document d is assigned the risk class, i.e. the user who generated this
content is susceptible to depression.
In order to determine the most suitable configuration for the IR and SL based systems,
as well as the threshold for the decision algorithm, we have performed several
experiments utilizing the pilot task training data.</p>
        <p>The classification models were selected after performing experiments with all three
classifiers using all feature types, or several feature types combined. Only the best
performing combination of feature sets and classifiers were kept for the SL-based systems.
For the experimental evaluation, the pilot task training dataset was utilized as described
in Section 4.1.</p>
        <p>The IR-based systems presented in Section 4.2 rank the users (writings) based on the
SIR(d) score. This score is based on the categories of the 20 top similar documents
retrieved. The number of documents in the top list has been setup through experiments on
the training set. We ran several tests with different values (from 5 to 50, with increment
of 5), and we chose 20 since it maximized the F-measure.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Results and Discussion</title>
      <p>We submitted predictions on the test dataset obtained by five different systems. Four
of these systems rely on a different ensemble configuration. The ensembles are either
a merge of results obtained from the SL and IR based systems, or from a group of SL
classifiers or IR-based systems. The five presented systems are described here:
– UQAMA is based on an ensemble approach, merging the output candidates from all
SL-based systems (considering three classifiers and all features), with the output
candidates from the IR-based systems.
– UQAMB is based on candidates proposed by both IR-based systems only. We
considered UQAMB as our baseline system.
– UQAMC is based on SL models built with a LMT classifier, and using as features
either BOW or bigrams separately, and BOW or bigrams together with all the
dictionary features.
– UQAMD is based on SL models built with an ens RF classifier, using as features
either BOW or bigrams together with all the dictionary features.
– Lastly, UQAME is based on SL models built with an ens SMO classifier, using
bigrams separately and together with all the dictionary features.</p>
      <p>The user posting frequency was a feature used by all five systems.</p>
      <p>
        Table 5 present the results obtained by the five systems in terms of the metrics utilized
by the CLEF eRisk pilot task. Besides F1, Precision, and Recall, the pilot task also
evaluated systems using the early risk detection error (ERDE) [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. The EDRE metric
accounts for the imbalance problem on automatic classification, which could bias some
classifiers. Additionally it penalizes late risk detection using a specific cost function,
considering only the true positive scores, which are related to only the relevant (risk)
documents.
      </p>
      <p>
        In total, 8 teams participated in the CLEF eRisk 2017 pilot task, submitting a total of
30 different systems [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. In Table 5, we highlight in bold the most interesting results
ERDE 5
      </p>
      <p>ERDE 50
UQAMA
UQAMB
UQAMC
UQAMD
UQAME
obtained by our systems. Among our five presented systems, the best overall
performance was achieved by UQAMA with the best F1 score and Recall. The best Precision
was achieved by UQAMD, which is designed based on an ens RF classifier. The
contribution of each method to the performance of UQAMA needs to be further evaluated, as
well as the impact of the various experimental settings.</p>
      <p>
        Finally, an interesting observation was drawn from analyzing the user posts of
candidates predicted as risk by our systems. The post content of such candidates often
presented two major topics: ”video games”, and ”sexuality or relationship issues”. The
relationship between ”depression” and these two topics has been studied from a clinical
perspective in several recent works [
        <xref ref-type="bibr" rid="ref3 ref9">21,3,9,22</xref>
        ]. Interestingly, the co-occurrence of these
topics with risk of depression was also spotted by our systems.
6
      </p>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>
        This report describes the early risk prediction systems submitted to the CLEF eRisk
2017 pilot task. The system that performed best is based on a multipronged approach,
which combines predictions from SL and IR based systems. SL-based systems made
use of four major feature types, and three classification algorithms, LMT, ensemble
SMO and ensemble RF. IR-based systems utilize two indexes, and users are ranked
according to a similarity score based on the BM25 ranking algorithm [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
The predictions obtained from both SL and IR based systems are merged by a decision
algorithm. The results demonstrate that combining SL and IR approaches outperforms
the results obtained by each approach applied separately.
      </p>
      <p>
        Future work During our experimental phase, we have performed preliminary tests to
evaluate the usage of three other methods: (1) simple rule-based classification using a
sentiment analysis library, (2) deep learning-based classification using a Recurrent
Neural Network (RNN), and (3) topic extraction using Latent Dirichlet Allocation [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
Improving the system performance will involve further investigation of these approaches,
as well as enhancement of the IR-based resources of the system.
      </p>
      <p>Reproducibility Our system is publicly released as an open source software, and can be
accessed at: https://github.com/BigMiners/eRisk2017
18. Milne, D.N., Pink, G., Hachey, B., Calvo, R.A.: CLPsych 2016 Shared Task: Triaging
content in online peer-support forums. In: Proceedings of the 3rd Workshop on Computational
Linguistics and Clinical Psychology (CLPsych). pp. 118–127 (2016)
19. Nguyen, T., Phung, D., Dao, B., Venkatesh, S., Berk, M.: Affective and content analysis of
online depression communities. IEEE Transactions on Affective Computing 5(3), 217–226
(2014)
20. Platt, J.: Sequential minimal optimization: A fast algorithm for training support vector
machines. Tech. Rep. MSR-TR-98-14, Microsoft (April 1998)
21. Ramrakha, S., Paul, C., Bell, M.L., Dickson, N., Moffitt, T.E., Caspi, A.: The relationship
between multiple sex partners and anxiety, depression, and substance dependence disorders:
a cohort study. Archives of Sexual Behavior 42(5), 863–872 (2013)
22. Schou Andreassen, C., Billieux, J., Griffiths, M.D., Kuss, D.J., Demetrovics, Z., Mazzoni,
E., Pallesen, S.: The relationship between addictive use of social media and video games
and symptoms of psychiatric disorders: A large-scale cross-sectional study. Psychology of
Addictive Behaviors 30(2), 252 (2016)
23. Schwartz, H.A., Sap, M., Kern, M.L., Eichstaedt, J.C., Kapelner, A., Agrawal, M., Blanco,
E., Dziurzynski, L., Park, G., Stillwell, D., et al.: Predicting individual well-being through
the language of social media. In: Pacific Symposium on Biocomputing (PSB). vol. 21, pp.
516–527 (January 2016)
24. Witten, I.H., Frank, E., Hall, M.A., Pal, C.J.: The WEKA Workbench. Online Appendix for
”Data Mining: Practical machine learning tools and techniques”. Morgan Kaufmann, 4 edn.
(2016)</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Blei</surname>
            ,
            <given-names>D.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ng</surname>
            ,
            <given-names>A.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jordan</surname>
            ,
            <given-names>M.I.</given-names>
          </string-name>
          :
          <article-title>Latent dirichlet allocation</article-title>
          .
          <source>Journal of Machine Learning Research 3(Jan)</source>
          ,
          <fpage>993</fpage>
          -
          <lpage>1022</lpage>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Breiman</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Random forests</article-title>
          .
          <source>Machine Learning 45(1)</source>
          ,
          <fpage>5</fpage>
          -
          <lpage>32</lpage>
          (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Brunborg</surname>
            ,
            <given-names>G.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mentzoni</surname>
            ,
            <given-names>R.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Frøyland</surname>
            ,
            <given-names>L.R.</given-names>
          </string-name>
          :
          <article-title>Is video gaming, or video game addiction, associated with depression, academic achievement, heavy episodic drinking, or conduct problems?</article-title>
          <source>Journal of Behavioral Addictions</source>
          <volume>3</volume>
          (
          <issue>1</issue>
          ),
          <fpage>27</fpage>
          -
          <lpage>32</lpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Cambria</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Olsher</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rajagopal</surname>
            ,
            <given-names>D.:</given-names>
          </string-name>
          <article-title>SenticNet 3: a common and common-sense knowledge base for cognition-driven sentiment analysis</article-title>
          .
          <source>In: Proceedings of the 28th AAAI Conference on Artificial Intelligence</source>
          . pp.
          <fpage>1515</fpage>
          -
          <lpage>1521</lpage>
          . AAAI Press (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Coppersmith</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dredze</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Harman</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hollingshead</surname>
            , K., Mitchell,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>CLPsych 2015 shared task: Depression and PTSD on Twitter</article-title>
          .
          <source>In: Proceedings of the 2nd Workshop on Computational Linguistics</source>
          and
          <article-title>Clinical Psychology (CLPsych): From Linguistic Signal to Clinical Reality</article-title>
          . pp.
          <fpage>31</fpage>
          -
          <lpage>39</lpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Coppersmith</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Harman</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dredze</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Measuring Post Traumatic Stress Disorder in Twitter</article-title>
          .
          <source>In: Proceedings of the 8th International AAAI Conference on Weblogs and Social Media (ICWSM)</source>
          (
          <year>June 2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Coppersmith</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ngo</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leary</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wood</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Exploratory analysis of social media prior to a suicide attempt</article-title>
          .
          <source>In: Proceedings of the 3rd Workshop on Computational Lingusitics and Clinical Psychology (CLPSych)</source>
          . pp.
          <fpage>106</fpage>
          -
          <lpage>117</lpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>De Choudhury</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gamon</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Counts</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Horvitz</surname>
          </string-name>
          , E.:
          <article-title>Predicting Depression via Social Media</article-title>
          .
          <source>In: Proceedings of the 7th International AAAI Conference on Weblogs and Social Media (ICWSM)</source>
          . p.
          <volume>2</volume>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Granic</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lobel</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Engels</surname>
            ,
            <given-names>R.C.</given-names>
          </string-name>
          :
          <article-title>The benefits of playing video games</article-title>
          .
          <source>American Psychologist</source>
          <volume>69</volume>
          (
          <issue>1</issue>
          ),
          <volume>66</volume>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Hammond</surname>
            ,
            <given-names>K.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Laundry</surname>
            ,
            <given-names>R.J.</given-names>
          </string-name>
          , OLeary, T.M.,
          <string-name>
            <surname>Jones</surname>
            ,
            <given-names>W.P.</given-names>
          </string-name>
          :
          <article-title>Use of text search to effectively identify lifetime prevalence of suicide attempts among veterans</article-title>
          .
          <source>In: Proceedings of the 46th Hawaii International Conference on System Sciences (HICSS)</source>
          . pp.
          <fpage>2676</fpage>
          -
          <lpage>2683</lpage>
          . IEEE (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Hutto</surname>
            ,
            <given-names>C.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gilbert</surname>
          </string-name>
          , E.:
          <article-title>VADER: A parsimonious rule-based model for sentiment analysis of social media text</article-title>
          .
          <source>In: Proceedings of the 8th International AAAI Conference on Weblogs and Social Media (ICWSM)</source>
          (
          <year>June 2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Jones</surname>
            ,
            <given-names>K.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Walker</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Robertson</surname>
            ,
            <given-names>S.E.</given-names>
          </string-name>
          :
          <article-title>A probabilistic model of information retrieval: development and comparative experiments: Part 2</article-title>
          .
          <source>Information Processing &amp; Management</source>
          <volume>36</volume>
          (
          <issue>6</issue>
          ),
          <fpage>809</fpage>
          -
          <lpage>840</lpage>
          (
          <year>2000</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Landwehr</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hall</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Frank</surname>
          </string-name>
          , E.:
          <article-title>Logistic model trees</article-title>
          .
          <source>Machine Learning</source>
          <volume>59</volume>
          (
          <issue>1-2</issue>
          ),
          <fpage>161</fpage>
          -
          <lpage>205</lpage>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jia</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guo</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xue</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cai</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Feng</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>User-level psychological stress detection from social media using deep neural network</article-title>
          .
          <source>In: Proceedings of the 22nd ACM International Conference on Multimedia</source>
          . pp.
          <fpage>507</fpage>
          -
          <lpage>516</lpage>
          . ACM (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Losada</surname>
            ,
            <given-names>D.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Crestani</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>A Test Collection for Research on Depression and Language Use</article-title>
          . In:
          <article-title>International Conference of the Cross-Language Evaluation Forum for European Languages</article-title>
          . pp.
          <fpage>28</fpage>
          -
          <lpage>39</lpage>
          . Springer (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Losada</surname>
            ,
            <given-names>D.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Crestani</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parapar</surname>
            ,
            <given-names>J.: eRISK</given-names>
          </string-name>
          <year>2017</year>
          :
          <article-title>CLEF Lab on Early Risk Prediction on the Internet: Experimental foundations</article-title>
          .
          <source>In: Proceedings Conference and Labs of the Evaluation Forum CLEF</source>
          <year>2017</year>
          . Dublin, Ireland (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>McClellan</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ali</surname>
            ,
            <given-names>M.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mutter</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kroutil</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Landwehr</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>Using social media to monitor mental health discussions- evidence from twitter</article-title>
          .
          <source>Journal of the American Medical Informatics Association (JAMIA)</source>
          p.
          <fpage>ocw133</fpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>