=Paper=
{{Paper
|id=Vol-2621/CIRCLE20_13
|storemode=property
|title=Early Detection of Depression and Anorexia from Social Media: A Machine Learning Approach
|pdfUrl=https://ceur-ws.org/Vol-2621/CIRCLE20_13.pdf
|volume=Vol-2621
|authors=Faneva Ramiandrisoa,Josiane Mothe
|dblpUrl=https://dblp.org/rec/conf/circle/RamiandrisoaM20
}}
==Early Detection of Depression and Anorexia from Social Media: A Machine Learning Approach==
Early Detection of Depression and Anorexia from Social Media: A Machine Learning Approach Faneva Ramiandrisoa Josiane Mothe faneva.ramiandrisoa@@irit.fr Josiane.Mothe@irit.fr IRIT, Univ. de Toulouse IRIT, UMR5505 CNRS, INSPE, Université de Toulouse Toulouse, France Toulouse, France Univ. d’Antananarivo Antananarivo, Madagascar ABSTRACT for example to help understanding consumer health information- In this paper, we present an approach on social media mining to seeking behavior [5], detecting mood [21], or sentiment about some help early detection of two mental illnesses: depression and anorexia. diseases [22], for pharmacovigilance applications [14], or even for We aim at detecting users that are likely to be ill, by learning from detecting depression [2] or suicidal ideation [3]. annotated examples. We mine texts to extract features for text repre- Our work is related to the latter applications. We aim at study- sentation and also use word embedding representation. The machine ing whether social media analysis and mining can help in mental learning based model we proposed uses these two types of text rep- illnesses detection. More specifically, we consider depression and resentation to predict the likelihood of each user to be ill. We use anorexia detection tasks. We developed a machine learning model 58 features from state of the art and 198 features new in this do- based on (a) a set of features that are extracted from users’ writings main that are part of our contribution. We evaluate our model on the and (b) vectors computed from users’ writings (posts and comments). CLEF eRisk 2018 reference collections. For depression detection, This model aims at predicting the likelihood for a user to be ill. While our model based on word embedding achieves the best performance the principles we use for both depression and anorexia detection are according to the measure ERDE50 and the model based on fea- the same, the main features used to detect one or the other illness tures only achieves the best performance according to precision. For are likely to differ. We thus analyze the differences on the two re- anorexia detection, the model based on word embedding achieves sulting models, specifically considering the important features in the the second-best results on ERDE50 and recall. We also observed that users’ writing representations. Results are based on two benchmark many of the new features we added contribute to improve the results. collections from the CLEF international forum 1 . With regard to automatic detection, these tasks can be considered CCS CONCEPTS as either a classification problem or a ranking problem. When con- sidering depression detection for example, it can be considered as a • Computing methodologies → Supervised learning; • Informa- binary classification problem : either the user is considered as (pos- tion systems → Information retrieval. sibly) depressed or as non depressed. Alternatively, we can consider the depression detection as a ranking or a regression problem if the KEYWORDS output is the likelihood for a user to be ill. Social Media Analysis, Text Mining, Depression Detection, Anorexia Supervised machine learning is the most common approach used Detection, Early Risk Detection, weak signal detection in related work. The principle is that a model is trained on a set of annotated examples (training cases), then the trained model is used on cases for which the model has to make a decision (test cases). Evaluation considers ground truth on the test cases. Moreover, 1 INTRODUCTION related work mainly consider a set of natural language processing (NLP) features extracted from texts [19] to represent items. While we Mental illness diagnosis has improved over decades [9]; however, re-use some features from the state of the art, in this paper, we also it is acknowledged that early detection for early treatment is fun- develop new features. In total we used 256 features from which 58 damental.Detection implies a medical consultation that sometimes features are state of the art and 198 features are new for these tasks takes time. While our aim is not to replace a medical diagnosis, we and part of our contribution. From the 198 features, 194 features are aim at studying whether social media analysis could help warning obtained from textual analysis across lexical categories and make on some persons possibly suffering from a mental illness. use of the python library Empath [7]; the 4 remaining features are Indeed, in the last ten years, the use of social media platforms related to the text publication dates. Moreover, we combine these like Reddit, Facebook, or Twitter has increased and is still expected features with a word embedding content representation. We compare to grow in the next years [25] . Their users generate a lot of data the resulting models, either combining representations or not, on two that can be used to extract insights on users, on their communication tasks in order to study which features are the most important and practices [24], on location information [11] and on what they say. how much they differ from one task to the other. In this paper we This information can also be used in medical-related applications, 1 "Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons Conference and Labs of the Evaluation Forum (CLEF) that promotes research, innova- License Attribution 4.0 International (CC BY 4.0)." tion, and development of information access systems www.clef-initiative.eu/ Faneva Ramiandrisoa and Josiane Mothe also investigate several machine learning algorithms and compare anorexic users and 132 non anorexic, while the test set contains the results obtained. 41 anorexic users and 279 non anorexic,for a total of 84,966 (resp. This paper is organized as follows: Section 2 describes the tasks, 168,786) posts or comments in the training (resp. test) set. data sets and evaluation measures, Section 3 overviews related work, Section 4 reports the model we propose, Section 5 reports the re- sults,and finally Section 6 concludes this paper. 3 RELATED WORK Many studies have investigated mental illness surveillance on social 2 TASKS AND DATA SETS media such as depression detection [2], anxiety and OCD [10] or The task and data used in this study are based on the CLEF Lab eating disorder detection [1]. There are also several evaluation frame- eRisk task 2 [16]. The main goal for both depression and anorexia works related to social media analysis for mental illness detection detection tasks is to detect as early as possible some signs of de- such as eRisk [16] and CLPsych [17]. pression/anorexia in texts. The detection is done on data collections Mainly, the techniques used to detect illness on social media are composed of texts sorted in a chronological order and divided into supervised methods based on features extracted from texts. Many 10 chunks. Chunk 1 contains the first 10% of each user’s writings features have been defined in the literature, for example with regard (the oldest), chunk 2 contains the second 10% and so forth. to depression detection, we can quote : n-grams [2], key-phrases [15], Prediction for each user is to be given for each chunk when pro- the frequency of punctuation [19], word generalization/topic models cessed sequentially. The user has to be predicted as depressed/anorexic, [20], URL mentions [2], capitalized words [19] and word/paragraph as non depressed/non anorexic, or the system can postpone its de- embedding [19]), sentiment or emotion [2, 20], lexical resources cision waiting for the next data chunks. When a user receives a such as antidepressant drugs name [2], linguistic features [19], ac- prediction, it is final and can not be reversed later. On the 10th and tivity or user behavior on the platform [2], Part-Of-Speech analysis last chunk, the system has to make a decision for each user and [19], text readability [23], emoticons [19], meta-information [4], the user has to be predicted either depressed/anorexic or not. More emotion [2], specific words [6]. In this section, we focus on related details about the tasks can be found in in [16]. work that considers the same task and data as us. In their partici- As the problem is to detect as early as possible the sign of mental pation to eRisk 2018 challenge, Trotzek et. al. used four machine illnesses, a new measure named ERDE was defined in [16]. It takes learning models [23]. While two of their machine learning models into account the correctness of the system decision and the delay it are based on CNN, the two others are based on features computed took to emit its decision. ERDE is defined as follow: from user’s text: a model based on user-level linguistic meta-data and cf p i f d is False Positive FP Bags of Words (BoW), and a model based only on BoW. They also cfn i f d is False Negative FN used a late fusion ensemble of three of these models: the one based ERDEo d, k = (1) on user-level linguistic meta-data and BoW, and the two based on lc k · c i f d is True positive T P o tp 0 i f d is True Negative T N CNN. The model based only on BoW achieved the top performance according to the measure ERDE50 and F-measure in both tasks (de- Where d is the binary decision taken by the system with delay k pression and anorexia) at eRISK 2018. On the same task, Funez et. for the user ; False (resp. True) Positive means d is positive and al [8] implemented two models: a model that uses Sequential Incre- ground truth is negative (resp. positive); False (resp. True) Negative mental Classification which classifies a user as risky based on the means d is negative and ground truth is positive (resp. negative); accumulated evidence, a model that uses a semantic representation c f n = ct p = 1; c f p is the proportion of positive cases in the test of documents which considers the partial information available at a collection; lco k = 1 − 1+e1k−o ; o is a parameter and equal to 5 for given time. The model based on semantic representation achieved ERDE5 and equal to 50 for ERDE50 . The ERDE value of the model the best results according to the measure ERDE5 for depression and is the mean of the ERDE obtained for each user computed with anorexia detection at eRisk 2018. The other model achieved the best Equation 1. For the ERDE measure, the smaller the value, the better. precision for anorexia detection. We also consider standard classification measures: precision, recall, In this paper, we extend Ramiandrisoa et. al.’ work [19] who and F-measure. built two machine learning models, one based on a set of features The depression detection data set is composed of chronological and the other based on a text representation using word embed- sequences of Reddit (www.reddit.com/) users’ posts and comments. ding. Indeed, the models developed in [19] are simpler than the the The CLEF eRisk data set was built by collecting submissions from ones from Funez et. al [8] or Trotzek et. al. [23] and are still very any subreddits3 for each user; those who had less than 10 submis- effective since the model based on a set of features achieved the sions were excluded. Users were annotated as depressed (214 users) second best precision at eRisk 2018. The Ramiandisoa’s model uses or non depressed (1,493 users). The training data set contains 135 de- several features from the literature of the domain, including some pressed users and 752 non depressed, while the test data set contains features from Trotzek et. al. [23]. We made the hypothesis that the 79 depressed users and 741 non depressed, for a total of 531,394 best model of Ramiandrisoa et. al. (LIIRB) could have achieved (resp. 545,188) posts/ comments in the training (resp. test) set. The better performance according to the measure ERDE50 for depression anorexia detection set was built in the same way as the depression detection if the prediction has started from chunk 1 while it started one but instead of searching for self-expressions of depression, self- at chunk 3. We also made the hypothesis that having a richer text expressions of anorexia were used. The training set contains 20 representation by adding more features could help the training and 2 https://early.irlab.org/2018/index.html, accessed on 2019-12-05 improve the results. For the study presented in this paper, we defined 3 Contents in Reddit platform are organized by areas of interest called "subreddits". new features obtained from textual analysis across lexical categories. Early Detection of Depression and Anorexia from Social Media: A Machine Learning Approach Researchers found that users’ mental health is correlated with the ; 105 of these 154 features are from the new features we added words they use [18]. Our hypothesis is the following: the writings where 104 are Empath categories and the last feature is the number of a user who suffers from depression or anorexia contain specific of publications between June and August (season 3). categories of words. For example texts are more likely to contain The top ten features for anorexia detection according to χ2 are as words belonging to sadness or fatigue related topics when written by follows : health, shame, Depression symptoms and related drugs, depressed people; similarly food or weight related topics are more First person pronoun myself, nervousness, ugliness, Frequency of likely to be found in writings from anorexic people. While some nouns (Part of speech frequency), body, Frequency of unigram feel of the features we used were specially designed for depression, we (Bag of words) and sadness. In that case, we observed that 57 fea- used them anyway for anorexia detection in order to study if they tures have χ2 statistic value higher than zero; keeping these 57 could be useful for other illness detection. Our results are compared features only, results are improved. 36 of these 57 features are new to several baselines: Ramiandrisoa et. al. [19], Trotzek et. al. [23] features we added where 35 are Empath categories and the last fea- and Funez et. al [8]. ture is the number of publications between March and May (season 2). 4 PROPOSED METHOD Considering the Empath categories [7], it seems that those related We consider three models that we combine to detect depression/anorexia: to sentiment are the most important for depression and those related (a) based on features extracted from users’ writings (posts and com- to physical appearance and food are the most important for anorexia. ments), (b) based on vectors computed from users’ writings and (c) A deeper analysis is needed to confirm this observation. Concerning combination of the two previous models. the features related to the text publication dates, an analysis must be To build the three models, we tested four classifiers which are conducted in order to know why feature season 3 (resp. season 2) is often used in NLP and produced good results in the literature : SMO important to our model for depression (resp. anorexia) detection. (Sequential Minimal Optimization), Random Forest, Logistic regres- We also observed that from features with χ2 > 0 on each task (154 sion and Naive Bayes. We report the classifiers that gave the best for depression and 57 for anorexia), 48 features are common to both results only. We found that on both depression and anorexia training tasks. Three features from the 48 common features are very specific data sets, Random Forest applied on the set of features achieves to depression but they are also useful for anorexia. These features the best results. When using word embedding text representation, are: drugs name, frequency of "depress" , and depression symptoms Logistic Regression achieves the best results. We report these mod- and related drugs. When we remove these three features, we observe els as ModRF and ModLR in this paper. For the combined model, F-measure decreases (from 0.71 to 0.67), as well as recall (from 0.60 we combine the output probability of the two models ModRF and to 0.55) and precision (from 0.86 to 0.84) (training with 10-folds ModLR. We report this later model as ModComb. cross-validation). Note that when identify the importance of features, Feature-based text representation In total, we extracted 256 training is based on gathering the 10 chunks users’ writings. features; we used 58 features defined by the authors in [19] for their Text representation based on text vectorization We also built participation to eRisk, in which some features are specially designed a text representation based on text vectorization relying on doc2vec for depression. 4 features are related to the text publication dates [13]. It represents users’ writings as a vector. For this, we trained two where we count the number of writings that a user has submitted separate doc2vec models on the training data. (a) Distributed Bag in each season of a year, (one season4 corresponds to 3 months), of Words model with 100 dimensional output [13]. (b) Distributed and 194 features are extracted using Empath tool5 [7] that have Memory model with 100 dimensional output which “ignore the never been used for this task in the past. These new features are very context words in the input, but force the model to predict words general and can be used for any text analyses and our contribution randomly sampled from the paragraph in the output” [13]. in this paper is to analyse their use for mental illnesses detection. Each user is represented by a vector. To compute the vector asso- Even if we used the same features in both tasks, that does not mean ciated to a user, we computed first the vector of each of the user’s that they are similarly important. In order to see what features are writings and then averaged those vectors. At a given chunk, all the important for each task, we used χ2 ranking6 on the correspondent writings from this chunk and the writings from previous chunks were training data set. This method evaluates the importance of the feature used to compute the vector. With regard to the training, we used by computing its χ2 statistic value with respect to the target class all 10 training set chunks to represent the user by a vector. In the (depressed/anorexic or non depressed/non anorexic). test stage, we represented the user by a vector computed with the The following features are the top ten according to χ2 ranking available chunks. A user vector is a concatenation of the output of for depression (Empath categories [7] are bold font) : Frequency of the distributed bag of words model and distributed memory model, "depress", contentment, sadness, nervousness, shame, Frequency resulting in a 200 dimensional vector. of nouns (Part of speech frequency), Frequency of unigram feel (Bag 5 RESULTS of words), First person pronoun myself, pain and love. We observed that 154 features have a χ2 statistic value higher than zero. Keeping In order to make a decision to annotate a user at a given chunk, these 154 features only in the model improves the results when we used a threshold that we set during the training stage, by test- training the model and it was also confirmed on the test collection ing different configurations. The way we defined the threshold is inspired form the work of [19]. We split the training data set into 4 two subsets, one to train the model with the classifiers and one to Season 1: December, January, and February; season 2: March, April, and May; etc. 5 https://github.com/Ejhfast/empath-client, accessed on 2019-12-10 test the model in order to define the threshold. As for depression, 6 We calculate χ2 ranking by Weka tool the training data set of eRisk 2018 is composed of training and test Faneva Ramiandrisoa and Josiane Mothe Table 1: ERDE5 and ERDE50 for detection of depression (left part) and anorexia (right part). The lower ERDE, the better. Depression Anorexia Name ERDE5 ERDE50 F P R ERDE5 ERDE50 F P R ModRF 9.62% 6.92% 0.58 0.69 0.51 12.40% 8.60% 0.71 0.89 0.59 ModLR 9.52% 6.12% 0.51 0.38 0.80 12.53% 6.27% 0.73 0.64 0.85 ModComb 9.52% 6.12% 0.51 0.38 0.80 12.34% 6.31% 0.72 0.62 0.85 UNLSA [8] 8.78% 7.39% 0.38 0.48 0.32 11.40% 7.82% 0.61 0.75 0.51 FHDO [23] 9.50% 6.44% 0.64 0.64 0.65 12.15% 5.96% 0.81 0.75 0.88 LIIRA [19] 9.46% 7.56% 0.50 0.61 0.42 12.78% 10.47% 0.71 0.81 0.63 LIIRB [19] 10.03% 7.09% 0.48 0.38 0.67 13.05% 10.33% 0.76 0.79 0.73 data sets of eRisk 2017; the splitting in eRisk 2017 is reused. For should be noted that the model ModRF is based on a set of features anorexia, we used the same threshold that we defined for depression from which some are specially designed for depression detection. as done by Funez et. al. [8] and Trotzek et. al. [23]. The idea behind Using features that are designed for anorexia may improve the results this choice is to measure whether the models can perform well in of the model ModRF and ModComb. In short, our model ModLR detecting different mental diseases without changing the threshold. achieved the second-best result according to the measure ERDE50 Our threshold is defined as follow: (a) For the model ModRF, a user and the measure recall (R). is predicted as having the mental illness when the model predicts it with a probability higher than 0.5. (b) For model ModLR, a user is considered as depressed/anorexic if the model predicts it with a probability higher than 0.55 when using at least 20 of his writings, 6 CONCLUSION 0.7 when using at least 10 writings, 0.5 when using more than 200 This work aims at helping early detection of mental illness (depres- writings and all probabilities above 0.9. All these values have been sion and anorexia) by analyzing social media. We used machine set using the training data sets only. (c) For the combined model learning approaches based on (a) features extracted from users’ writ- ModComb, a user is considered as depressed/anorexic if model ings, (b) text representation using word embedding. We developed ModRF and model ModLR predict it. When the two models had three models: one is based on features only, the second on word different predictions, we gave priority to the prediction from model embedding text representation only and the lastest combines the ModLR using the same threshold as depicted above. If the model two previous models. We used 58 features defined in [19] and 198 ModLR does not predict the user as depressed/anorexic, then we new features. The models are evaluated on two benchmark data sets considered the predictions from model ModRF. This priority was provided at eRisk 2018 in CLEF international forum. decided because model ModLR achieved better results than model Our models can help to detect depression and anorexia. By adding ModRF on the training data set. In Table 1, the results with ModRF new features, we outperformed the results of the authors in [19] and are obtained with the selected features. the results of the participants to the eRisk 2018 challenge accord- The left side part of Table 1 presents the results of the three models ing to two main evaluation measures (ERDE50 and precision). For on the depression test data set. We also report the best results from depression, when compared to other participants on the eRisk task, participants in eRisk 2018 when considering ERDE5 and ERDE50 the model based on word embedding achieved the best performance namely UNLSA [8] and FHDO-BCSGB[23] and the best results according to the measure ERDE50 (this measure evaluates both cor- of Ramiandrisoa et. al which are named LIIRA and LIIRB [19]. rectness of the decision and time used to take it) and third-best result Other participants’ results are details in [16]. We can see that there according to recall. The model based on features only achieved the is no clear difference on results between the model ModLR and the best performance according to precision. For anorexia, the word combined model ModComb; however they achieve better results embedding models achieved the second-best result for ERDE50 and than model ModRF when considering ERDE5 , ERDE50 and recall. recall and the feature model achieved the third-best precision. This On eRisk 2018, compared to all the participants’ results, our result could be surprising since some of the features we used on model ModLR achieves the best results according to ERDE50 ; it is anorexia detection task are specially designed for depression. This ranked 3rd according to recall (R). Our model ModRF achieves the result leads us to think that there may be a link between depression best results according to precision (P). and anorexia regarding the features that can help to detect them. We Right side part of Table 1 reports the results of our three models on also observed that 105 of the 154 selected features for depression anorexia test data set. The best models when considering ERDE5 and detection and 36 of the 57 selected features for anorexia that are ERDE50 from eRisk 2018, and the best results from Ramiandrisoa selected are new features we added in this study. et. al [19]. When comparing our three models, we can see that the For future work, we would like to investigate new features specif- model ModLR gives the best results when considering ERDE50 , ically designed for anorexia detection. On the other hand, we want F-measure (F) and recall (R). Model ModRF gives the best results to test different features selection such as the ones presented in [12]. when considering precision (P) and the model ModComb gives the Finally, we could analyze users’ social signals such as the subject best results when considering ERDE5 . users leave comments on or like. When comparing to the other participants from eRisk 2018, model Ethical issue. While CLEF eRisk has its proper ethical policies, ModRF achieves the third-best results according to precision. It detecting depression, anorexia or any other human state or behavior raises ethical issues that are beyond the scope of the paper. Early Detection of Depression and Anorexia from Social Media: A Machine Learning Approach REFERENCES [21] Ramon Gouveia Rodrigues, Rafael Marques das Dores, Celso G Camilo-Junior, [1] Stevie Chancellor, Zhiyuan Lin, Erica L. Goodman, Stephanie Zerwas, and Mun- and Thierson Couto Rosa. 2016. SentiHealth-Cancer: a sentiment analysis tool to mun De Choudhury. 2016. Quantifying and Predicting Mental Illness Severity in help detecting mood of patients in online social networks. International journal Online Pro-Eating Disorder Communities. In Proceedings of the 19th ACM Con- of medical informatics 85, 1 (2016), 80–95. ference on Computer-Supported Cooperative Work & Social Computing, CSCW [22] María del Pilar Salas-Zárate, José Medina-Moreira, Katty Lagos-Ortiz, Harry 2016, San Francisco, CA, USA, February 27 - March 2, 2016. 1169–1182. Luna-Aveiga, Miguel Angel Rodriguez-Garcia, and Rafael Valencia-García. 2017. [2] Munmun De Choudhury, Michael Gamon, Scott Counts, and Eric Horvitz. 2013. Sentiment analysis on tweets about diabetes: an aspect-level approach. Computa- Predicting Depression via Social Media. ICWSM (2013). tional and mathematical methods in medicine 2017 (2017). [3] Munmun De Choudhury, Emre Kiciman, Mark Dredze, Glen Coppersmith, and [23] Marcel Trotzek, Sven Koitka, and Christoph M. Friedrich. 2018. Word Embed- Mrinal Kumar. 2016. Discovering Shifts to Suicidal Ideation from Mental Health dings and Linguistic Metadata at the CLEF 2018 Tasks for Early Detection of Content in Social Media. In Proceedings of the 2016 CHI Conference on Human Depression and Anorexia. In Working Notes of CLEF 2018 - Conference and Labs Factors in Computing Systems, San Jose, CA, USA, May 7-12, 2016. 2098–2110. of the Evaluation Forum, Avignon, France, September 10-14, 2018. [4] Arman Cohan, Sydney Young, Andrew Yates, and Nazli Goharian. 2017. Triaging [24] Rupa Sheth Valdez and Patricia Flatley Brennan. 2015. Exploring patientsâĂŹ content severity in online mental health forums. JASIST 68, 11 (2017), 2675– health information communication practices with social network members as 2689. a foundation for consumer health IT design. International journal of medical [5] Zhaohua Deng and Shan Liu. 2017. Understanding consumer health information- informatics 84, 5 (2015), 363–374. seeking behavior from the perspective of the risk perception attitude framework [25] Yu-Tseng Wang, Hen-Hsen Huang, and Hsin-Hsi Chen. 2018. A Neural Network and social support in mobile social media websites. International journal of Approach to Early Risk Detection of Depression and Anorexia on Social Media medical informatics 105 (2017), 98–109. Text. In Working Notes of CLEF 2018 - Conference and Labs of the Evaluation [6] Johannes C Eichstaedt, Robert J Smith, Raina M Merchant, Lyle H Ungar, Patrick Forum, Avignon, France, September 10-14, 2018. Crutchley, Daniel Preoţiuc-Pietro, David A Asch, and H Andrew Schwartz. 2018. Facebook language predicts depression in medical records. Proceedings of the National Academy of Sciences 115, 44 (2018), 11203–11208. [7] Ethan Fast, Binbin Chen, and Michael S. Bernstein. 2016. Empath: Understanding Topic Signals in Large-Scale Text. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, San Jose, CA, USA, May 7-12, 2016. 4647–4657. https://doi.org/10.1145/2858036.2858535 [8] Dario G. Funez, Maria José Garciarena Ucelay, Maria Paula Villegas, Sergio Burdisso, Leticia C. Cagnina, Manuel Montes-y-Gómez, and Marcelo Errecalde. 2018. UNSL’s participation at eRisk 2018 Lab. In Working Notes of CLEF 2018 - Conference and Labs of the Evaluation Forum, Avignon, France, September 10-14, 2018. [9] Sharath Chandra Guntuku, David B Yaden, Margaret L Kern, Lyle H Ungar, and Johannes C Eichstaedt. 2017. Detecting depression and mental illness on social media: an integrative review. Current Opinion in Behavioral Sciences 18 (2017), 43–49. [10] Bibo Hao, Lin Li, Ang Li, and Tingshao Zhu. 2013. Predicting Mental Health Status on Social Media - A Preliminary Study on Microblog. In Cross-Cultural Design. Cultural Differences in Everyday Life - 5th International Conference, CCD 2013, Held as Part of HCI International 2013, Las Vegas, NV, USA, July 21-26, 2013, Proceedings, Part II. 101–110. [11] Thi Bich Ngoc Hoang and Josiane Mothe. 2018. Location extraction from tweets. Information Processing & Management 54, 2 (2018), 129–144. [12] Léa Laporte, Rémi Flamary, Stéphane Canu, Sébastien Déjean, and Josiane Mothe. 2013. Nonconvex regularizations for feature selection in ranking with sparse SVM. IEEE Transactions on Neural Networks and Learning Systems 25, 6 (2013), 1118–1130. [13] Quoc V. Le and Tomas Mikolov. 2014. Distributed Representations of Sentences and Documents. In Proceedings of the 31th International Conference on Machine Learning, ICML 2014, Beijing, China, 21-26 June 2014. 1188–1196. [14] Jing Liu and Gang Wang. 2018. Pharmacovigilance from social media: An im- proved random subspace method for identifying adverse drug events. International Journal of Medical Informatics 117 (2018), 33–43. [15] Ning Liu, Zheng Zhou, Kang Xin, and Fuji Ren. 2018. TUA1 at eRisk 2018. In Working Notes of CLEF 2018 - Conference and Labs of the Evaluation Forum, Avignon, France, September 10-14, 2018. [16] David E. Losada, Fabio Crestani, and Javier Parapar. 2018. Overview of eRisk – Early Risk Prediction on the Internet. In Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Ninth International Conference of the CLEF Association (CLEF 2018). Avignon, France. [17] David N. Milne, Glen Pink, Ben Hachey, and Rafael A. Calvo. 2016. CLPsych 2016 Shared Task: Triaging content in online peer-support forums. In Proceedings of the 3rd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality, CLPsych@NAACL-HLT 2016, June 16, 2016, San Diego, California, USA. 118–127. [18] James W Pennebaker, Ryan L Boyd, Kayla Jordan, and Kate Blackburn. 2015. The development and psychometric properties of LIWC2015. Technical Report. University of Texas at Austin. [19] Faneva Ramiandrisoa, Josiane Mothe, Farah Benamara, and Véronique Moriceau. 2018. IRIT at e-Risk 2018. In Experimental IR Meets Multilinguality, Multimodal- ity, and Interaction. Proceedings of the Ninth International Conference of the CLEF Association (CLEF 2018). Avignon, France. [20] Philip Resnik, William Armstrong, Leonardo Max Batista Claudino, Thang Nguyen, Viet-An Nguyen, and Jordan L. Boyd-Graber. 2015. Beyond LDA: Exploring Supervised Topic Modeling for Depression-Related Language in Twit- ter. In Proceedings of CLPsych@NAACL-HLT.