=Paper= {{Paper |id=Vol-2621/CIRCLE20_13 |storemode=property |title=Early Detection of Depression and Anorexia from Social Media: A Machine Learning Approach |pdfUrl=https://ceur-ws.org/Vol-2621/CIRCLE20_13.pdf |volume=Vol-2621 |authors=Faneva Ramiandrisoa,Josiane Mothe |dblpUrl=https://dblp.org/rec/conf/circle/RamiandrisoaM20 }} ==Early Detection of Depression and Anorexia from Social Media: A Machine Learning Approach== https://ceur-ws.org/Vol-2621/CIRCLE20_13.pdf
    Early Detection of Depression and Anorexia from Social Media: A
                       Machine Learning Approach
                           Faneva Ramiandrisoa                                                                    Josiane Mothe
                        faneva.ramiandrisoa@@irit.fr                                                  Josiane.Mothe@irit.fr
                           IRIT, Univ. de Toulouse                                      IRIT, UMR5505 CNRS, INSPE, Université de Toulouse
                              Toulouse, France                                                          Toulouse, France
                            Univ. d’Antananarivo
                          Antananarivo, Madagascar
ABSTRACT                                                                                for example to help understanding consumer health information-
In this paper, we present an approach on social media mining to                         seeking behavior [5], detecting mood [21], or sentiment about some
help early detection of two mental illnesses: depression and anorexia.                  diseases [22], for pharmacovigilance applications [14], or even for
We aim at detecting users that are likely to be ill, by learning from                   detecting depression [2] or suicidal ideation [3].
annotated examples. We mine texts to extract features for text repre-                       Our work is related to the latter applications. We aim at study-
sentation and also use word embedding representation. The machine                       ing whether social media analysis and mining can help in mental
learning based model we proposed uses these two types of text rep-                      illnesses detection. More specifically, we consider depression and
resentation to predict the likelihood of each user to be ill. We use                    anorexia detection tasks. We developed a machine learning model
58 features from state of the art and 198 features new in this do-                      based on (a) a set of features that are extracted from users’ writings
main that are part of our contribution. We evaluate our model on the                    and (b) vectors computed from users’ writings (posts and comments).
CLEF eRisk 2018 reference collections. For depression detection,                        This model aims at predicting the likelihood for a user to be ill. While
our model based on word embedding achieves the best performance                         the principles we use for both depression and anorexia detection are
according to the measure ERDE50 and the model based on fea-                             the same, the main features used to detect one or the other illness
tures only achieves the best performance according to precision. For                    are likely to differ. We thus analyze the differences on the two re-
anorexia detection, the model based on word embedding achieves                          sulting models, specifically considering the important features in the
the second-best results on ERDE50 and recall. We also observed that                     users’ writing representations. Results are based on two benchmark
many of the new features we added contribute to improve the results.                    collections from the CLEF international forum 1 .
                                                                                            With regard to automatic detection, these tasks can be considered
CCS CONCEPTS                                                                            as either a classification problem or a ranking problem. When con-
                                                                                        sidering depression detection for example, it can be considered as a
• Computing methodologies → Supervised learning; • Informa-
                                                                                        binary classification problem : either the user is considered as (pos-
tion systems → Information retrieval.
                                                                                        sibly) depressed or as non depressed. Alternatively, we can consider
                                                                                        the depression detection as a ranking or a regression problem if the
KEYWORDS                                                                                output is the likelihood for a user to be ill.
Social Media Analysis, Text Mining, Depression Detection, Anorexia                          Supervised machine learning is the most common approach used
Detection, Early Risk Detection, weak signal detection                                  in related work. The principle is that a model is trained on a set
                                                                                        of annotated examples (training cases), then the trained model is
                                                                                        used on cases for which the model has to make a decision (test
                                                                                        cases). Evaluation considers ground truth on the test cases. Moreover,
1    INTRODUCTION                                                                       related work mainly consider a set of natural language processing
                                                                                        (NLP) features extracted from texts [19] to represent items. While we
Mental illness diagnosis has improved over decades [9]; however,
                                                                                        re-use some features from the state of the art, in this paper, we also
it is acknowledged that early detection for early treatment is fun-
                                                                                        develop new features. In total we used 256 features from which 58
damental.Detection implies a medical consultation that sometimes
                                                                                        features are state of the art and 198 features are new for these tasks
takes time. While our aim is not to replace a medical diagnosis, we
                                                                                        and part of our contribution. From the 198 features, 194 features are
aim at studying whether social media analysis could help warning
                                                                                        obtained from textual analysis across lexical categories and make
on some persons possibly suffering from a mental illness.
                                                                                        use of the python library Empath [7]; the 4 remaining features are
    Indeed, in the last ten years, the use of social media platforms
                                                                                        related to the text publication dates. Moreover, we combine these
like Reddit, Facebook, or Twitter has increased and is still expected
                                                                                        features with a word embedding content representation. We compare
to grow in the next years [25] . Their users generate a lot of data
                                                                                        the resulting models, either combining representations or not, on two
that can be used to extract insights on users, on their communication
                                                                                        tasks in order to study which features are the most important and
practices [24], on location information [11] and on what they say.
                                                                                        how much they differ from one task to the other. In this paper we
This information can also be used in medical-related applications,

                                                                                        1
"Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons    Conference and Labs of the Evaluation Forum (CLEF) that promotes research, innova-
License Attribution 4.0 International (CC BY 4.0)."                                     tion, and development of information access systems www.clef-initiative.eu/
                                                                                                                              Faneva Ramiandrisoa and Josiane Mothe



also investigate several machine learning algorithms and compare                          anorexic users and 132 non anorexic, while the test set contains
the results obtained.                                                                     41 anorexic users and 279 non anorexic,for a total of 84,966 (resp.
   This paper is organized as follows: Section 2 describes the tasks,                     168,786) posts or comments in the training (resp. test) set.
data sets and evaluation measures, Section 3 overviews related work,
Section 4 reports the model we propose, Section 5 reports the re-
sults,and finally Section 6 concludes this paper.                                         3   RELATED WORK
                                                                                          Many studies have investigated mental illness surveillance on social
2       TASKS AND DATA SETS                                                               media such as depression detection [2], anxiety and OCD [10] or
The task and data used in this study are based on the CLEF Lab                            eating disorder detection [1]. There are also several evaluation frame-
eRisk task 2 [16]. The main goal for both depression and anorexia                         works related to social media analysis for mental illness detection
detection tasks is to detect as early as possible some signs of de-                       such as eRisk [16] and CLPsych [17].
pression/anorexia in texts. The detection is done on data collections                        Mainly, the techniques used to detect illness on social media are
composed of texts sorted in a chronological order and divided into                        supervised methods based on features extracted from texts. Many
10 chunks. Chunk 1 contains the first 10% of each user’s writings                         features have been defined in the literature, for example with regard
(the oldest), chunk 2 contains the second 10% and so forth.                               to depression detection, we can quote : n-grams [2], key-phrases [15],
    Prediction for each user is to be given for each chunk when pro-                      the frequency of punctuation [19], word generalization/topic models
cessed sequentially. The user has to be predicted as depressed/anorexic,                  [20], URL mentions [2], capitalized words [19] and word/paragraph
as non depressed/non anorexic, or the system can postpone its de-                         embedding [19]), sentiment or emotion [2, 20], lexical resources
cision waiting for the next data chunks. When a user receives a                           such as antidepressant drugs name [2], linguistic features [19], ac-
prediction, it is final and can not be reversed later. On the 10th and                    tivity or user behavior on the platform [2], Part-Of-Speech analysis
last chunk, the system has to make a decision for each user and                           [19], text readability [23], emoticons [19], meta-information [4],
the user has to be predicted either depressed/anorexic or not. More                       emotion [2], specific words [6]. In this section, we focus on related
details about the tasks can be found in in [16].                                          work that considers the same task and data as us. In their partici-
    As the problem is to detect as early as possible the sign of mental                   pation to eRisk 2018 challenge, Trotzek et. al. used four machine
illnesses, a new measure named ERDE was defined in [16]. It takes                         learning models [23]. While two of their machine learning models
into account the correctness of the system decision and the delay it                      are based on CNN, the two others are based on features computed
took to emit its decision. ERDE is defined as follow:                                     from user’s text: a model based on user-level linguistic meta-data and
                       
                            cf p     i f d is False Positive FP                           Bags of Words (BoW), and a model based only on BoW. They also
                       
                       
                       
                            cfn     i f d is False Negative FN                            used a late fusion ensemble of three of these models: the one based
       ERDEo d, k =                                                 (1)                   on user-level linguistic meta-data and BoW, and the two based on
                         lc k · c     i f d is True positive T P
                        o tp
                       
                       
                             0       i f d is True Negative T N                           CNN. The model based only on BoW achieved the top performance
                                                                                          according to the measure ERDE50 and F-measure in both tasks (de-
Where d is the binary decision taken by the system with delay k                           pression and anorexia) at eRISK 2018. On the same task, Funez et.
for the user ; False (resp. True) Positive means d is positive and                        al [8] implemented two models: a model that uses Sequential Incre-
ground truth is negative (resp. positive); False (resp. True) Negative                    mental Classification which classifies a user as risky based on the
means d is negative and ground truth is positive (resp. negative);                        accumulated evidence, a model that uses a semantic representation
c f n = ct p = 1; c f p is the proportion of positive cases in the test                   of documents which considers the partial information available at a
collection; lco k = 1 − 1+e1k−o ; o is a parameter and equal to 5 for                     given time. The model based on semantic representation achieved
ERDE5 and equal to 50 for ERDE50 . The ERDE value of the model                            the best results according to the measure ERDE5 for depression and
is the mean of the ERDE obtained for each user computed with                              anorexia detection at eRisk 2018. The other model achieved the best
Equation 1. For the ERDE measure, the smaller the value, the better.                      precision for anorexia detection.
We also consider standard classification measures: precision, recall,                        In this paper, we extend Ramiandrisoa et. al.’ work [19] who
and F-measure.                                                                            built two machine learning models, one based on a set of features
     The depression detection data set is composed of chronological                       and the other based on a text representation using word embed-
sequences of Reddit (www.reddit.com/) users’ posts and comments.                          ding. Indeed, the models developed in [19] are simpler than the the
The CLEF eRisk data set was built by collecting submissions from                          ones from Funez et. al [8] or Trotzek et. al. [23] and are still very
any subreddits3 for each user; those who had less than 10 submis-                         effective since the model based on a set of features achieved the
sions were excluded. Users were annotated as depressed (214 users)                        second best precision at eRisk 2018. The Ramiandisoa’s model uses
or non depressed (1,493 users). The training data set contains 135 de-                    several features from the literature of the domain, including some
pressed users and 752 non depressed, while the test data set contains                     features from Trotzek et. al. [23]. We made the hypothesis that the
79 depressed users and 741 non depressed, for a total of 531,394                          best model of Ramiandrisoa et. al. (LIIRB) could have achieved
(resp. 545,188) posts/ comments in the training (resp. test) set. The                     better performance according to the measure ERDE50 for depression
anorexia detection set was built in the same way as the depression                        detection if the prediction has started from chunk 1 while it started
one but instead of searching for self-expressions of depression, self-                    at chunk 3. We also made the hypothesis that having a richer text
expressions of anorexia were used. The training set contains 20                           representation by adding more features could help the training and
2
    https://early.irlab.org/2018/index.html, accessed on 2019-12-05                       improve the results. For the study presented in this paper, we defined
3
    Contents in Reddit platform are organized by areas of interest called "subreddits".   new features obtained from textual analysis across lexical categories.
Early Detection of Depression and Anorexia from Social Media: A Machine Learning Approach



Researchers found that users’ mental health is correlated with the                          ; 105 of these 154 features are from the new features we added
words they use [18]. Our hypothesis is the following: the writings                          where 104 are Empath categories and the last feature is the number
of a user who suffers from depression or anorexia contain specific                          of publications between June and August (season 3).
categories of words. For example texts are more likely to contain                               The top ten features for anorexia detection according to χ2 are as
words belonging to sadness or fatigue related topics when written by                        follows : health, shame, Depression symptoms and related drugs,
depressed people; similarly food or weight related topics are more                          First person pronoun myself, nervousness, ugliness, Frequency of
likely to be found in writings from anorexic people. While some                             nouns (Part of speech frequency), body, Frequency of unigram feel
of the features we used were specially designed for depression, we                          (Bag of words) and sadness. In that case, we observed that 57 fea-
used them anyway for anorexia detection in order to study if they                           tures have χ2 statistic value higher than zero; keeping these 57
could be useful for other illness detection. Our results are compared                       features only, results are improved. 36 of these 57 features are new
to several baselines: Ramiandrisoa et. al. [19], Trotzek et. al. [23]                       features we added where 35 are Empath categories and the last fea-
and Funez et. al [8].                                                                       ture is the number of publications between March and May (season
                                                                                            2).
4    PROPOSED METHOD                                                                            Considering the Empath categories [7], it seems that those related
We consider three models that we combine to detect depression/anorexia:                     to sentiment are the most important for depression and those related
(a) based on features extracted from users’ writings (posts and com-                        to physical appearance and food are the most important for anorexia.
ments), (b) based on vectors computed from users’ writings and (c)                          A deeper analysis is needed to confirm this observation. Concerning
combination of the two previous models.                                                     the features related to the text publication dates, an analysis must be
   To build the three models, we tested four classifiers which are                          conducted in order to know why feature season 3 (resp. season 2) is
often used in NLP and produced good results in the literature : SMO                         important to our model for depression (resp. anorexia) detection.
(Sequential Minimal Optimization), Random Forest, Logistic regres-                              We also observed that from features with χ2 > 0 on each task (154
sion and Naive Bayes. We report the classifiers that gave the best                          for depression and 57 for anorexia), 48 features are common to both
results only. We found that on both depression and anorexia training                        tasks. Three features from the 48 common features are very specific
data sets, Random Forest applied on the set of features achieves                            to depression but they are also useful for anorexia. These features
the best results. When using word embedding text representation,                            are: drugs name, frequency of "depress" , and depression symptoms
Logistic Regression achieves the best results. We report these mod-                         and related drugs. When we remove these three features, we observe
els as ModRF and ModLR in this paper. For the combined model,                               F-measure decreases (from 0.71 to 0.67), as well as recall (from 0.60
we combine the output probability of the two models ModRF and                               to 0.55) and precision (from 0.86 to 0.84) (training with 10-folds
ModLR. We report this later model as ModComb.                                               cross-validation). Note that when identify the importance of features,
   Feature-based text representation In total, we extracted 256                             training is based on gathering the 10 chunks users’ writings.
features; we used 58 features defined by the authors in [19] for their                          Text representation based on text vectorization We also built
participation to eRisk, in which some features are specially designed                       a text representation based on text vectorization relying on doc2vec
for depression. 4 features are related to the text publication dates                        [13]. It represents users’ writings as a vector. For this, we trained two
where we count the number of writings that a user has submitted                             separate doc2vec models on the training data. (a) Distributed Bag
in each season of a year, (one season4 corresponds to 3 months),                            of Words model with 100 dimensional output [13]. (b) Distributed
and 194 features are extracted using Empath tool5 [7] that have                             Memory model with 100 dimensional output which “ignore the
never been used for this task in the past. These new features are very                      context words in the input, but force the model to predict words
general and can be used for any text analyses and our contribution                          randomly sampled from the paragraph in the output” [13].
in this paper is to analyse their use for mental illnesses detection.                           Each user is represented by a vector. To compute the vector asso-
   Even if we used the same features in both tasks, that does not mean                      ciated to a user, we computed first the vector of each of the user’s
that they are similarly important. In order to see what features are                        writings and then averaged those vectors. At a given chunk, all the
important for each task, we used χ2 ranking6 on the correspondent                           writings from this chunk and the writings from previous chunks were
training data set. This method evaluates the importance of the feature                      used to compute the vector. With regard to the training, we used
by computing its χ2 statistic value with respect to the target class                        all 10 training set chunks to represent the user by a vector. In the
(depressed/anorexic or non depressed/non anorexic).                                         test stage, we represented the user by a vector computed with the
   The following features are the top ten according to χ2 ranking                           available chunks. A user vector is a concatenation of the output of
for depression (Empath categories [7] are bold font) : Frequency of                         the distributed bag of words model and distributed memory model,
"depress", contentment, sadness, nervousness, shame, Frequency                              resulting in a 200 dimensional vector.
of nouns (Part of speech frequency), Frequency of unigram feel (Bag
                                                                                            5    RESULTS
of words), First person pronoun myself, pain and love. We observed
that 154 features have a χ2 statistic value higher than zero. Keeping                       In order to make a decision to annotate a user at a given chunk,
these 154 features only in the model improves the results when                              we used a threshold that we set during the training stage, by test-
training the model and it was also confirmed on the test collection                         ing different configurations. The way we defined the threshold is
                                                                                            inspired form the work of [19]. We split the training data set into
4
                                                                                            two subsets, one to train the model with the classifiers and one to
  Season 1: December, January, and February; season 2: March, April, and May; etc.
5
  https://github.com/Ejhfast/empath-client, accessed on 2019-12-10                          test the model in order to define the threshold. As for depression,
6
  We calculate χ2 ranking by Weka tool                                                      the training data set of eRisk 2018 is composed of training and test
                                                                                                                 Faneva Ramiandrisoa and Josiane Mothe


     Table 1: ERDE5 and ERDE50 for detection of depression (left part) and anorexia (right part). The lower ERDE, the better.
                                      Depression                                   Anorexia
                 Name       ERDE5        ERDE50       F       P      R ERDE5 ERDE50               F     P     R
                 ModRF       9.62%         6.92% 0.58 0.69 0.51 12.40%                8.60% 0.71 0.89 0.59
                 ModLR       9.52%         6.12% 0.51 0.38 0.80 12.53%                6.27% 0.73 0.64 0.85
                 ModComb     9.52%         6.12% 0.51 0.38 0.80 12.34%                6.31% 0.72 0.62 0.85
                 UNLSA [8]   8.78%         7.39% 0.38 0.48 0.32 11.40%                7.82% 0.61 0.75 0.51
                 FHDO [23]   9.50%         6.44% 0.64 0.64 0.65 12.15%                5.96% 0.81 0.75 0.88
                 LIIRA [19]  9.46%         7.56% 0.50 0.61 0.42 12.78%              10.47% 0.71 0.81 0.63
                 LIIRB [19] 10.03%         7.09% 0.48 0.38 0.67 13.05%              10.33% 0.76 0.79 0.73


data sets of eRisk 2017; the splitting in eRisk 2017 is reused. For         should be noted that the model ModRF is based on a set of features
anorexia, we used the same threshold that we defined for depression         from which some are specially designed for depression detection.
as done by Funez et. al. [8] and Trotzek et. al. [23]. The idea behind      Using features that are designed for anorexia may improve the results
this choice is to measure whether the models can perform well in            of the model ModRF and ModComb. In short, our model ModLR
detecting different mental diseases without changing the threshold.         achieved the second-best result according to the measure ERDE50
Our threshold is defined as follow: (a) For the model ModRF, a user         and the measure recall (R).
is predicted as having the mental illness when the model predicts
it with a probability higher than 0.5. (b) For model ModLR, a user
is considered as depressed/anorexic if the model predicts it with a
probability higher than 0.55 when using at least 20 of his writings,
                                                                            6   CONCLUSION
0.7 when using at least 10 writings, 0.5 when using more than 200           This work aims at helping early detection of mental illness (depres-
writings and all probabilities above 0.9. All these values have been        sion and anorexia) by analyzing social media. We used machine
set using the training data sets only. (c) For the combined model           learning approaches based on (a) features extracted from users’ writ-
ModComb, a user is considered as depressed/anorexic if model                ings, (b) text representation using word embedding. We developed
ModRF and model ModLR predict it. When the two models had                   three models: one is based on features only, the second on word
different predictions, we gave priority to the prediction from model        embedding text representation only and the lastest combines the
ModLR using the same threshold as depicted above. If the model              two previous models. We used 58 features defined in [19] and 198
ModLR does not predict the user as depressed/anorexic, then we              new features. The models are evaluated on two benchmark data sets
considered the predictions from model ModRF. This priority was              provided at eRisk 2018 in CLEF international forum.
decided because model ModLR achieved better results than model                 Our models can help to detect depression and anorexia. By adding
ModRF on the training data set. In Table 1, the results with ModRF          new features, we outperformed the results of the authors in [19] and
are obtained with the selected features.                                    the results of the participants to the eRisk 2018 challenge accord-
   The left side part of Table 1 presents the results of the three models   ing to two main evaluation measures (ERDE50 and precision). For
on the depression test data set. We also report the best results from       depression, when compared to other participants on the eRisk task,
participants in eRisk 2018 when considering ERDE5 and ERDE50                the model based on word embedding achieved the best performance
namely UNLSA [8] and FHDO-BCSGB[23] and the best results                    according to the measure ERDE50 (this measure evaluates both cor-
of Ramiandrisoa et. al which are named LIIRA and LIIRB [19].                rectness of the decision and time used to take it) and third-best result
Other participants’ results are details in [16]. We can see that there      according to recall. The model based on features only achieved the
is no clear difference on results between the model ModLR and the           best performance according to precision. For anorexia, the word
combined model ModComb; however they achieve better results                 embedding models achieved the second-best result for ERDE50 and
than model ModRF when considering ERDE5 , ERDE50 and recall.                recall and the feature model achieved the third-best precision. This
   On eRisk 2018, compared to all the participants’ results, our            result could be surprising since some of the features we used on
model ModLR achieves the best results according to ERDE50 ; it is           anorexia detection task are specially designed for depression. This
ranked 3rd according to recall (R). Our model ModRF achieves the            result leads us to think that there may be a link between depression
best results according to precision (P).                                    and anorexia regarding the features that can help to detect them. We
   Right side part of Table 1 reports the results of our three models on    also observed that 105 of the 154 selected features for depression
anorexia test data set. The best models when considering ERDE5 and          detection and 36 of the 57 selected features for anorexia that are
ERDE50 from eRisk 2018, and the best results from Ramiandrisoa              selected are new features we added in this study.
et. al [19]. When comparing our three models, we can see that the              For future work, we would like to investigate new features specif-
model ModLR gives the best results when considering ERDE50 ,                ically designed for anorexia detection. On the other hand, we want
F-measure (F) and recall (R). Model ModRF gives the best results            to test different features selection such as the ones presented in [12].
when considering precision (P) and the model ModComb gives the              Finally, we could analyze users’ social signals such as the subject
best results when considering ERDE5 .                                       users leave comments on or like.
   When comparing to the other participants from eRisk 2018, model             Ethical issue. While CLEF eRisk has its proper ethical policies,
ModRF achieves the third-best results according to precision. It            detecting depression, anorexia or any other human state or behavior
                                                                            raises ethical issues that are beyond the scope of the paper.
Early Detection of Depression and Anorexia from Social Media: A Machine Learning Approach



REFERENCES                                                                                  [21] Ramon Gouveia Rodrigues, Rafael Marques das Dores, Celso G Camilo-Junior,
 [1] Stevie Chancellor, Zhiyuan Lin, Erica L. Goodman, Stephanie Zerwas, and Mun-                and Thierson Couto Rosa. 2016. SentiHealth-Cancer: a sentiment analysis tool to
     mun De Choudhury. 2016. Quantifying and Predicting Mental Illness Severity in               help detecting mood of patients in online social networks. International journal
     Online Pro-Eating Disorder Communities. In Proceedings of the 19th ACM Con-                 of medical informatics 85, 1 (2016), 80–95.
     ference on Computer-Supported Cooperative Work & Social Computing, CSCW                [22] María del Pilar Salas-Zárate, José Medina-Moreira, Katty Lagos-Ortiz, Harry
     2016, San Francisco, CA, USA, February 27 - March 2, 2016. 1169–1182.                       Luna-Aveiga, Miguel Angel Rodriguez-Garcia, and Rafael Valencia-García. 2017.
 [2] Munmun De Choudhury, Michael Gamon, Scott Counts, and Eric Horvitz. 2013.                   Sentiment analysis on tweets about diabetes: an aspect-level approach. Computa-
     Predicting Depression via Social Media. ICWSM (2013).                                       tional and mathematical methods in medicine 2017 (2017).
 [3] Munmun De Choudhury, Emre Kiciman, Mark Dredze, Glen Coppersmith, and                  [23] Marcel Trotzek, Sven Koitka, and Christoph M. Friedrich. 2018. Word Embed-
     Mrinal Kumar. 2016. Discovering Shifts to Suicidal Ideation from Mental Health              dings and Linguistic Metadata at the CLEF 2018 Tasks for Early Detection of
     Content in Social Media. In Proceedings of the 2016 CHI Conference on Human                 Depression and Anorexia. In Working Notes of CLEF 2018 - Conference and Labs
     Factors in Computing Systems, San Jose, CA, USA, May 7-12, 2016. 2098–2110.                 of the Evaluation Forum, Avignon, France, September 10-14, 2018.
 [4] Arman Cohan, Sydney Young, Andrew Yates, and Nazli Goharian. 2017. Triaging            [24] Rupa Sheth Valdez and Patricia Flatley Brennan. 2015. Exploring patientsâĂŹ
     content severity in online mental health forums. JASIST 68, 11 (2017), 2675–                health information communication practices with social network members as
     2689.                                                                                       a foundation for consumer health IT design. International journal of medical
 [5] Zhaohua Deng and Shan Liu. 2017. Understanding consumer health information-                 informatics 84, 5 (2015), 363–374.
     seeking behavior from the perspective of the risk perception attitude framework        [25] Yu-Tseng Wang, Hen-Hsen Huang, and Hsin-Hsi Chen. 2018. A Neural Network
     and social support in mobile social media websites. International journal of                Approach to Early Risk Detection of Depression and Anorexia on Social Media
     medical informatics 105 (2017), 98–109.                                                     Text. In Working Notes of CLEF 2018 - Conference and Labs of the Evaluation
 [6] Johannes C Eichstaedt, Robert J Smith, Raina M Merchant, Lyle H Ungar, Patrick              Forum, Avignon, France, September 10-14, 2018.
     Crutchley, Daniel Preoţiuc-Pietro, David A Asch, and H Andrew Schwartz. 2018.
     Facebook language predicts depression in medical records. Proceedings of the
     National Academy of Sciences 115, 44 (2018), 11203–11208.
 [7] Ethan Fast, Binbin Chen, and Michael S. Bernstein. 2016. Empath: Understanding
     Topic Signals in Large-Scale Text. In Proceedings of the 2016 CHI Conference
     on Human Factors in Computing Systems, San Jose, CA, USA, May 7-12, 2016.
     4647–4657. https://doi.org/10.1145/2858036.2858535
 [8] Dario G. Funez, Maria José Garciarena Ucelay, Maria Paula Villegas, Sergio
     Burdisso, Leticia C. Cagnina, Manuel Montes-y-Gómez, and Marcelo Errecalde.
     2018. UNSL’s participation at eRisk 2018 Lab. In Working Notes of CLEF 2018 -
     Conference and Labs of the Evaluation Forum, Avignon, France, September 10-14,
     2018.
 [9] Sharath Chandra Guntuku, David B Yaden, Margaret L Kern, Lyle H Ungar, and
     Johannes C Eichstaedt. 2017. Detecting depression and mental illness on social
     media: an integrative review. Current Opinion in Behavioral Sciences 18 (2017),
     43–49.
[10] Bibo Hao, Lin Li, Ang Li, and Tingshao Zhu. 2013. Predicting Mental Health
     Status on Social Media - A Preliminary Study on Microblog. In Cross-Cultural
     Design. Cultural Differences in Everyday Life - 5th International Conference,
     CCD 2013, Held as Part of HCI International 2013, Las Vegas, NV, USA, July
     21-26, 2013, Proceedings, Part II. 101–110.
[11] Thi Bich Ngoc Hoang and Josiane Mothe. 2018. Location extraction from tweets.
     Information Processing & Management 54, 2 (2018), 129–144.
[12] Léa Laporte, Rémi Flamary, Stéphane Canu, Sébastien Déjean, and Josiane Mothe.
     2013. Nonconvex regularizations for feature selection in ranking with sparse
     SVM. IEEE Transactions on Neural Networks and Learning Systems 25, 6 (2013),
     1118–1130.
[13] Quoc V. Le and Tomas Mikolov. 2014. Distributed Representations of Sentences
     and Documents. In Proceedings of the 31th International Conference on Machine
     Learning, ICML 2014, Beijing, China, 21-26 June 2014. 1188–1196.
[14] Jing Liu and Gang Wang. 2018. Pharmacovigilance from social media: An im-
     proved random subspace method for identifying adverse drug events. International
     Journal of Medical Informatics 117 (2018), 33–43.
[15] Ning Liu, Zheng Zhou, Kang Xin, and Fuji Ren. 2018. TUA1 at eRisk 2018. In
     Working Notes of CLEF 2018 - Conference and Labs of the Evaluation Forum,
     Avignon, France, September 10-14, 2018.
[16] David E. Losada, Fabio Crestani, and Javier Parapar. 2018. Overview of eRisk –
     Early Risk Prediction on the Internet. In Experimental IR Meets Multilinguality,
     Multimodality, and Interaction. Proceedings of the Ninth International Conference
     of the CLEF Association (CLEF 2018). Avignon, France.
[17] David N. Milne, Glen Pink, Ben Hachey, and Rafael A. Calvo. 2016. CLPsych
     2016 Shared Task: Triaging content in online peer-support forums. In Proceedings
     of the 3rd Workshop on Computational Linguistics and Clinical Psychology: From
     Linguistic Signal to Clinical Reality, CLPsych@NAACL-HLT 2016, June 16, 2016,
     San Diego, California, USA. 118–127.
[18] James W Pennebaker, Ryan L Boyd, Kayla Jordan, and Kate Blackburn. 2015.
     The development and psychometric properties of LIWC2015. Technical Report.
     University of Texas at Austin.
[19] Faneva Ramiandrisoa, Josiane Mothe, Farah Benamara, and Véronique Moriceau.
     2018. IRIT at e-Risk 2018. In Experimental IR Meets Multilinguality, Multimodal-
     ity, and Interaction. Proceedings of the Ninth International Conference of the
     CLEF Association (CLEF 2018). Avignon, France.
[20] Philip Resnik, William Armstrong, Leonardo Max Batista Claudino, Thang
     Nguyen, Viet-An Nguyen, and Jordan L. Boyd-Graber. 2015. Beyond LDA:
     Exploring Supervised Topic Modeling for Depression-Related Language in Twit-
     ter. In Proceedings of CLPsych@NAACL-HLT.