Anorexia Topical Trends in Self-declared Reddit Users Razan Masood♣ , Mengjiao Hu♣ , Hermenegildo Fabregat♦ , Ahmet Aker♣ , and Norbert Fuhr♣ ♣ University of Duisburg-Essen, Duisburg, Germany ♦ Universidad Nacional de Educación a Distancia, Madrid, Spain firstname.lastname@uni-due.de gildo.fabregat@lsi.uned.es ABSTRACT Anorexia is “an eating disorder characterized by abnormally low Social Media platforms have been a vital environment to share body weight, an intense fear of gaining weight, and a distorted experiences and seek knowledge. People with various interests perception of weight. People with anorexia place a high value on form online communities in which they can accumulate many controlling their weight and shape”1 . We use posts extracted from experiences from many peers. Among these communities are the Reddit, “an online network of communities based on people’s inter- mental health-related ones that have been growing on Social Media ests”. The different Reddit communities are referred to as subreddits. in the last few years. However, users can show alarming behavioral Each subreddit is devoted to a specific topic. Plenty of subreddits are signs at the stage of their mental illness that should be identified related to AN and other eating disorders such as EatingDisoders and before it is too late. Hence, equipping social media platforms with AnorexiaRecovery subreddits. People resort to such communities the needed tools to monitor its users, identify risks, and intervene for many purposes. Some communities promote sharing recovery on time has been of great concern recently. In this paper, we target experiences and emotional support, and others can cause more users who self disclose as being diagnosed with an eating disorder, harm like pro-Anorexia communities, which promote unhealthy namely Anorexia. We provide a dataset of manually labeled Reddit body-image and diets. Hence, the chances are that there are users users’ posts, focused on the extraction of some potentially relevant who may face serious risks, which obliged SM platforms to keep topics for the study of eating disorders. E.g. diets, exercises, body their environments under control and provide possible intervention image, etc. These topics can be utilized to find patterns in Anorexic when needed. users’ behaviors to distinguish them from users who are less likely By investigating the specific type of information or topics AN to have Anorexia. They can also be used to interpret afflicted users’ diagnosed users post about, we observed that the most frequently attitudes. We support our labeling with baseline experiments to discussed topics are diet and eating routines, weight, family and learn how to differentiate between these topics. relationship issues, anxiety, and depression problems. Figure 1 shows an example of a pair of a positive user (diagnosed with AN) and a CCS CONCEPTS negative user (not diagnosed with AN). The timeline of the first 50 posts for the two users and their post topics are plotted. The example • Human-centered computing → Social networking sites; • shows that the positive user (blue) posts more frequently on topics Applied computing → Psychology. related to their mental health state (4 times), eating disorders (2 KEYWORDS times), diet (4 times) and physical pain (1 time). On the other hand, the negative user posts (orange) about family and exercises, among mental health, Reddit, social media, Anorexia, machine learning other non-significant topics. Based on this analysis, we suggest that we can use particular topical patterns to analyze and explain AN 1 INTRODUCTION users behaviour in a more understandable way, which can be helpful Humanity has come a long way in maintaining a high level of to distinguish risky users. Furthermore, when topical patterns are societies’ and individuals’ well-being, including physical health, combined with, e.g., emotions [5, 8] or other aspects like stance, they education and freedom. Still, a lot needs to be done in the mental can help to reveal the severeness level of the illness [26]. Besides, health domain, which is getting more attention in the modern age these patterns could be extended and adapted to other mental health of prospering technologies [17]. More people with mental health issues such as substance abuse and depression. Hence, we believe issues resort to Social Media (SM) platforms either to directly seek that post-level classification could be useful for medical researchers support and information or to communicate their thoughts and and psychiatrists to analyze topical extracts of SM history and feelings indirectly. Recently, the data that such users produce on SM evaluate the prevalence of the pattern of certain topics among AN has proved to predict their mental health state and its severity [7]. sufferers. Besides, it provides precious resources for practitioners and experts Our main contributions in this paper are as follows: (1) We define as a possible tool for mental health-related research. Moreover, topics of importance to identify Reddit users who are more likely predicting mental health issues in the early stages is essential to to have AN. (2) We provide a dataset of users’ posts annotated provide the needed support in alarming situations like preventing with defined relevant topics. (3) We present baselines to predict the suicide, self-harming, and eating disorders [12, 16, 28]. different posts categories based on the labeled dataset 2 . In this paper, we target SM users who have explicitly stated 1 https://www.mayoclinic.org/diseases-conditions/anorexia-nervosa/symptoms- that they were clinically diagnosed with Anorexia Nervosa (AN). causes/syc-20353591 "Copyright © 2020 for this paper by its authors. Use permitted under Creative Com- 2 The dataset and best performing models code are released for research pur- mons License Attribution 4.0 International (CC BY 4.0)." poses https://github.com/razanmasood/Anorexia_Topical_Trends_in_Self_declared_ Masood, et al. writer towards the mentioned topics in the post when assigning Figure 1: Positive vs. negative users’ posts topics shown for the labels. the first 50 posts in two users’ timeline. The x-axis shows the order of the posts, and the y-axis the assigned topics. 3 DATASET We use eRisk 2019 dataset 3 . eRisk is a part of CLEF (Conference and Labs of the Evaluation Forum) 2019 labs. The lab has desig- nated a task for the early detection of Reddit users with signs of Anorexia [19]. The training data is a set of users under two cate- gories. One is the users who stated in at least one of their posts that they were diagnosed with Anorexia, and the other category did not. For each user in the dataset, all posts and comments made by that user (which are up to 1000 posts and 1000 comments) are chronolog- ically sorted [18]. The post in which a user declares their diagnosis 2 RELATED WORK was filtered out. Posts and comments of the users can belong to Online Social Media has driven a wide range of investigations any subreddit. For our purposes, we selected 55 positive users and on mental health by exploiting the growing users’ data [14, 27]. labeled 50-100 posts starting from the earliest post/comment. This Many datasets have been collected from SM platforms for language research is oriented towards investigating the topics of interest of or communication analysis and risk prediction [10, 26, 30]. The positive users, but to examine the occurrence of similar topics in datasets collection is based on different rules and methods [15]. negative users’ posts, we picked ten negative users to label their One method is to identify users affected by mental illnesses using posts using the same criteria. psychiatric surveys assessed by experts. The selected users’ SM accounts are then explored based on the results of the survey [13]. 3.1 Labels Another method is to consider users who have mentioned that To choose the posts’ labels, we manually examined the different they have been diagnosed with a mental illness on their social topics that appeared more frequently than other topics in posi- media as positive cases [9, 18]. Nonetheless, the datasets mentioned tive users’ posts. Then, we verified and expanded the topics using above are labeled on users level, i.e a user have or does not have related work that analyzed posts of social media users, which in- the targeted mental illness. dicate symptoms of Anorexia [6, 7, 29]. The selected topics are A third method is to annotate posts based on the signals it holds related to mental health disorders and anxiety, self-harm, suicidal and that characterize the mental illness in question manually. The thoughts, pain, and hints of the desire to be skinny. In addition, we annotations are either determined from the data or based on the- used additional topics that were shown to be related to general ory [15]. The vast majority of post level annotations regarding mental health evaluation, like family, and sleep, as found in [11, 25]. eating disorders characteristics were done as part of content analy- The selected set of topics was rearranged under seven labels of sis work by experts. Mowrey et al. defined an annotation scheme for posts with related AN topics and an additional label for posts that labeling tweets according to depressive symptoms and psychoso- cannot be labeled under any of the seven defined topics. The num- cial stressors [21]. The goal of the final corpus is to understand the ber of labels is qualified for automatic classification experiments. depression language and to identify the differences between psycho- The posts labels are as the following 4 : (1) Eating disorder: Posts logical factors. Moreover, Sowles et al. extended the annotation to with explicit mentions of experiences that indicate eating disorder coding the attitude and the support behaviors of the comments [23]. (Anorexia, Bulimia, ED), and behaviors like binging and induced On the other hand, the frequent topics brought up by mental health throw-ups. (2) General mental health: Under this label are posts online communities has been explored using topic modeling such with mentions of signs of mental disturbances and inconveniences. as LDA (Latent Dirichlet Allocation) and other methods [7, 22, 29]. Examples of these are: a. Posts with indications of depression, anxi- However, the problem with automatic topic modeling, when ap- ety, and sadness expressions. b. Posts with signs of harming oneself plied to Reddit posts, is that posts as documents are not long enough and suicidal expressions. c. Posts with mentions of issues related for topic modeling. Moreover, when all posts of a user are joined in to sleep like lack of sleep or oversleep. d. Posts with mentions of one document, it is more likely to undergo topic shifts, variation alcohol drinking problems and other addiction issues like drugs in tone, and hence, be out of context [10]. To our knowledge, a and smoking. (3) Medication: Posts with mentions of medication manually annotated post-level Reddit dataset for topics related to names. Some medications could be used for treatment reasons or Anorexia is not available as a basis for both enhanced supervised for inducing throw-ups. (4) Family & friends: Posts that contain and unsupervised classification models. Besides, our topic annota- stories on friends or family members. (5) Diets & Food: Posts with tions are more descriptive than bare automatic topics. Our manual mentions of specific foods, recipes, and diets that include fasting, annotation criteria are defined based on the dataset observations skipping meals, and purging. (6) Body shape & exercises: Posts with and on the previous work that defined frequently mentioned topics mention of the body’s weight, height, BMI, and other body-image by people who show symptoms of eating disorder online. Unlike expressions. In addition to posts that mention exercise routines and the experts based annotations, we do not involve the attitude of the 3 https://early.irlab.org/2019/index.html Reddit_Users. The labels are released by the IDs of the original dataset because of the 4 According to the user agreement signed with eRisk organizers, it is not allowed to signed user agreement. show contents from the dataset. Anorexia Topical Trends in Self-declared Reddit Users Table 1: Labels with number of instances (#) for each and obtained using well-known classification approaches based on Lo- agreement scores considering (Fleiss Kappa κ). The number gistic Regression Classifier, LSTM (Long Short-Term Memory), and of instances for each label in Training, Development and CNN (Convolutional Neural Networks). Firstly, we divided the cor- Test sets are shown in the corresponding columns. pus into three sets (training, development, and test), where each set comprised different users to avoid learning user-specific fea- Label κ # Train Dev Test tures like individual writing styles. We choose to experiment with Eating disorder 0.72 126 85 24 17 the main label only rather than dealing with multiple labels for a General mental health 0.49 170 71 43 56 post. The distribution of posts can be seen in Table 1. Then, we Medication 0.4 60 33 22 5 pre-process the posts’ textual content by lemmatizing, lower-casing, Family & friends 0.49 146 84 36 26 and removing expressions related to the Reddit platform like tag- Diets & food 0.69 234 171 39 24 ging forums and users. The cleaned posts and comments are the Body shape & exercises 0.75 280 188 48 47 input to the ML models, and the assigned labels are the targets to Physical pain & sickness 0.49 130 108 8 14 be learned. Other 0.69 3959 2549 765 645 To analyze the task at different levels of complexity, we con- sider two experimental frameworks, namely, binary and multiclass classifiers. For the binary task, we transform our eight labels into other physical activities. (7) Physical pain & sickness: Posts with two labels, one is the Related label that has all the seven labels mentions of physical sickness or illness. (8) Other: Any post not relevant to AN, and the Unrelated label that has the Other label. related to the categories mentioned above. The second task is a fine-grained multi-classification task that is We define the topics in a way that makes it more straightforward set to distinguish the eight labels individually. for non-clinician annotators. The labels’ definitions do no involve The three used models are set as the following: judgment on the severeness, emotions, or attitude the writer has Logistic Regression with TF-IDF (LR-TFIDF). We use the lo- towards the reported topics. This separation is necessary to ensure gistic regression implementation by Python’s Sklearn package. We annotation with fewer inaccuracies and to separate emotions and feed the classifier with Term Frequency-Inverse Document Fre- attitude factors from the plain topic labels. quency (TF-IDF) features of uni- and bi-word grams. LSTM with inner-attention (LSTM-Att). For this model, we 3.2 Annotation represented each post by its term embeddings extracted using Five master students from the Computer Science department anno- GloVe [3]. Then each term was weighted by the average value tated the data. The annotators were paid per hour. We dedicated a of the embeddings of certain recurrent terms. The recurrent terms session to train the annotators and made sure that they follow the are selected by extracting the significant terms for each label against definition of the labels through a selected sample of posts. The posts the other labels by the Chi-square test on TF-IDF features of uni- were annotated with as many labels as the topics mentioned. Taking gram words. We selected the most significant 200 terms for each into account possible further experiments, we fixed one label for label. We then calculated the average embedding GloVe vector for each post as the main label. We define the main label as the one each set of terms for each label. The inner attention mechanism that the annotator found being the most representative/dominant is based on weighting each term of a post/comment by the aver- label of the post [4]. age vector of each label. The model was implemented as in [1, 24]. In the case of multiple labels, we calculated the agreement reached For the LSTM, we used a single forward layer, eight neurons, and on individual labels using Fleiss’ Kappa for multiple raters using Hyperbolic Tangent activation function. the Python library statsmodels 0.11.05 . Because each post can have CNN with Meta-Map (CNN-MM) with which we explored the multiple labels, we calculate the agreement for each label separately, addition of more focused knowledge using concepts extracted by i.e., to observe the agreement between raters to choose a specific Meta-Map, an NLP tool focused on information retrieval from the label for the post. The agreement results are shown in the second biomedical domain and enriched with several thesauri [2]. As Meta- column of Table 1. The least agreement value we get is on the label Map provides for each identified concept the semantic category to Medication, which can be due to fewer posts available on the topic. which it belongs, we explored an approach using this knowledge. Another reason is that the annotators are not experts, and in many In total, we studied 50 semantic groups manually selected based cases, it is hard to recognize medication names easily. The third on their relationship with the labels, including Activity, Behavior, column of Table 1 shows the number of instances for each label Disease or Syndrome. In short, each post has been represented as taken as a main label only. We use Fleiss’ Kappa as well to compute a sequence of terms and its respective category. We used GloVe the agreement on the main label. The agreement score obtained is to represent the words and a trainable embedding vector of 50 di- 0.65, which is considered to be in an acceptable range [20]. mensions to represent the semantic categories. The CNN is applied with a fixed window of 5 elements and a total of 128 neurons. 4 AUTOMATIC TOPIC CLASSIFICATION To further understand the complexity of classifying users’ posts 4.1 Results and Error Analysis according to the defined topic labels, we present baseline results The overall classification performance including precision (P), recall 5 https://www.statsmodels.org/stable/generated/statsmodels.stats.inter_rater.fleiss_ (R) and F1 shown in Table 3 confirms that the multiclass classifica- kappa.html tion is trickier than the binary one. CNN, combined with Meta-Map Masood, et al. Table 2: Performance of the models reported on each label individually on the test set. (1) Logistic Regression with TF-IDF features (LR-TFIDF), (2) LSTM with attention (LSTM-Att), and (3) CNN with MetaMap (CNN-MM) General mental Body shape Physical pain Eating disorder Medication Family & friends Diets & Food Other health & exercises & sickness P R F1 P R F1 P R F1 P R F1 P R F1 P R F1 P R F1 P R F1 M/LR-TFIDF 0.64 0.53 0.58 0.80 0.07 0.13 0 0 0 0.27 0.12 0.16 0.88 0.29 0.44 0.65 0.28 0.39 0.50 0.07 0.12 0.83 0.99 0.90 M/LSTM-Att 0.36 0.29 0.32 0.4 0.04 0.07 0 0 0 0.24 0.27 0.25 0.54 0.54 0.54 0.50 0.32 0.39 0.17 0.43 0.24 0.88 0.95 0.91 M/CNN-MM 0.62 0.59 0.61 0.50 0.07 0.12 0.50 0.20 0.29 0.17 0.04 0.06 0.59 0.54 0.57 0.65 0.43 0.51 0.22 0.14 0.17 0.86 0.98 0.92 Figure 2: Confusion matrices obtained on test set. The num- Table 3: Results using Binary (B) and Multi-class (M) classifi- bers refer to the labels in the order they are mentioned in cation with each of the models Logistic Regression with TF- section 3.1 IDF features (LR-TFIDF), LSTM with attention (LSTM-Att), and CNN with MetaMap (CNN-MM) Dev Test Model P R F1 P R F1 B/LR-TFID 0.83 0.80 0.81 0.77 0.76 0.77 B/LSTM-Att 0.84 0.78 0.80 0.85 0.80 0.82 B/CNN-MM 0.85 0.80 0.82 0.88 0.79 0.83 M/LR-TFIDF 0.42 0.29 0.33 0.57 0.29 0.34 (a) (b) (c) M/LSTM-Att 0.39 0.33 0.33 0.39 0.35 0.34 semantic groups, performed significantly better than the other two M/CNN-MM 0.52 0.37 0.41 0.51 0.37 0.41 models for the multiclass task6 . We further list detailed results on each of the labels in multi- class settings in Table 2. The highest performance according to F1 5 CONCLUSIONS AND FUTURE WORK measure are on Eating disorder, Diets & food, Body shape & Exer- In this paper, we report the annotation process of Reddit posts cises and Other label. The unbalanced distribution of labels highly and comments by self-declared users with Anorexia Nervosa. We influences the performance. Hence, the high performance on the define an annotation scheme of fine-grained labels according to Other labels. CNN-MM performed well on the labels that contained topics related to the diagnosis of Anorexia Nervosa. We show that terms related to the medical semantic groups in Meta-Map in their our annotation is rather robust as the Fleiss’ Kappa agreement assigned posts, e.g., Body parts terms and Eating disorders terms. values are in an acceptable range. We further test the possibility of However, CNN-MM performed better than the other models on the predicting post topics automatically. The classification results show Medication Label despite the few items in training data due to the that predicting one main label for long posts is tricky to perform medication terms used for features encoding. However, the perfor- accurately. Hence, making use of the multiple-label annotations to mance on the General mental health label is not as expected. This predict multiple labels for each post can be a possible solution in might be due to the fact that this label involves multiple domains addition to specifying sentence-level annotations. The annotation that confused the classifiers. In other words, the confusion matrice scheme provided in this paper is related to the topics of which self- in Figure 2(a) shows that the LSTM-Att model confused General declared Reddit users of AN mention more frequently than the other mental health label (2) mostly with Physical pain & sickness label (7) users. However, these topics are not enough to distinguish risky besides the Other label (8). This confusion can be due to that many users as this might lead to false-positive predictions because many posts that mention general mental health issues like lack of sleep users use these communities because they want to help someone also mention pain aspects, which made the classifier choose the related to them who are diagnosed with AN. Onward, in our future Physical pain label (7). LSTM-Att also shows better performance work, we will explore the possibility of employing the topics to on Family & Friends label (4) as it uses terms related to this topic predict risky users. The prediction models can be enriched with the to weigh the post terms, unlike the CNN model (Figure 2(b)) with sequential development of emotions and stances that accompany which Meta-Map does not have such terms. the topics [5, 8]. Furthermore, the different features combinations The classification results show that terms play an important allow the estimation of the severeness level of the targeted illness. role in identifying the topics. This is shown by the improvement Also, what can be quite interesting is how to make these models in performance when achieved when supporting the models with adapt and be diverse to learn different forms of mental illnesses. targeted related terms in comparison with using TF-IDF features alone (Figure 2(c)). Nevertheless, the problem is that a post, es- ACKNOWLEDGMENTS pecially the longer ones, can discuss many topics. Therefore, we This work was funded by the Deutsche Forschungsgemeinschaft suggest ML models with multiple outputs of labels scores. Besides, (DFG, German Research Foundation) - GRK 2167, Research Train- the labeling process can be enhanced by highlighting the related ing Group ”User-Centred Social Media". The work has been also sentences according to each label rather than labeling longer posts partially supported by the Spanish Ministry of Science and Innova- to make the labeled text more focused. tion within the projects PROSA-MED (TIN2016-77820-C3-2-R) and 6 McNemar’s test, p < 0.0125 after Bonferroni correction. EXTRAE-II (IMIENS 2019). Anorexia Topical Trends in Self-declared Reddit Users REFERENCES [23] Shaina J Sowles, Monique McLeary, Allison Optican, Elizabeth Cahn, Melissa J [1] Ahmet Aker, Alfred Sliwa, Fahim Dalvi, and Kalina Bontcheva. 2019. Rumour Krauss, Ellen E Fitzsimmons-Craft, Denise E Wilfley, and Patricia A Cavazos- verification through recurring information and an inner-attention mechanism. Rehg. 2018. A content analysis of an online pro-eating disorder community on Online Social Networks and Media 13 (2019), 100045. Reddit. Body image 24 (2018), 137–144. [2] Alan R Aronson. 2001. Effective mapping of biomedical text to the UMLS Metathe- [24] Christian Stab, Tristan Miller, and Iryna Gurevych. 2018. Cross-topic argument saurus: the MetaMap program.. In Proceedings of the AMIA Symposium. American mining from heterogeneous sources using attention-based neural networks. arXiv Medical Informatics Association, 17. preprint arXiv:1802.05758 (2018). [3] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural ma- [25] Andrew Toulis and Lukasz Golab. 2017. Social Media Mining to Understand chine translation by jointly learning to align and translate. arXiv preprint Public Mental Health. In VLDB Workshop on Data Management and Analytics for arXiv:1409.0473 (2014). Medicine and Healthcare. Springer, 55–70. [4] Victoria Bobicev and Marina Sokolova. 2017. Inter-Annotator Agreement in [26] Tao Wang, Markus Brede, Antonella Ianni, and Emmanouil Mentzakis. 2018. Sentiment Analysis: Machine Learning Perspective.. In RANLP. 97–102. Social interactions in online eating disorder communities: A network perspective. [5] Craig J Bryan, Jonathan E Butner, Sungchoon Sinclair, Anna Belle O Bryan, PloS one 13, 7 (2018). Christina M Hesse, and Andree E Rose. 2018. Predictors of emerging suicide [27] Andrew Yates, Arman Cohan, and Nazli Goharian. 2017. Depression and self- death among military personnel on social media networks. Suicide and Life- harm risk assessment in online forums. arXiv preprint arXiv:1709.01848 (2017). Threatening Behavior 48, 4 (2018), 413–430. [28] Wu Youyou, Michal Kosinski, and David Stillwell. 2015. Computer-based person- [6] Patricia A Cavazos-Rehg, Melissa J Krauss, Shaina J Costello, Nina Kaiser, Eliza- ality judgments are more accurate than those made by humans. Proceedings of beth S Cahn, Ellen E Fitzsimmons-Craft, and Denise E Wilfley. 2019. “I just want the National Academy of Sciences 112, 4 (2015), 1036–1040. to be skinny.”: A content analysis of tweets expressing eating disorder symptoms. [29] Sicheng Zhou, Yunpeng Zhao, Rubina Rizvi, Jiang Bian, Ann F Haynos, and Rui PloS one 14, 1 (2019), e0207506. Zhang. 2019. Analysis of Twitter to Identify Topics Related to Eating Disorder [7] Stevie Chancellor, Zhiyuan Lin, Erica L Goodman, Stephanie Zerwas, and Mun- Symptoms. In 2019 IEEE International Conference on Healthcare Informatics (ICHI). mun De Choudhury. 2016. Quantifying and predicting mental illness severity IEEE, 1–4. in online pro-eating disorder communities. In Proceedings of the 19th ACM Con- [30] Ayah Zirikly, Philip Resnik, Ozlem Uzuner, and Kristy Hollingshead. 2019. ference on Computer-Supported Cooperative Work & Social Computing. ACM, CLPsych 2019 shared task: Predicting the degree of suicide risk in Reddit posts. 1171–1184. In Proceedings of the Sixth Workshop on Computational Linguistics and Clinical [8] Xuetong Chen, Martin D Sykora, Thomas W Jackson, and Suzanne Elayan. 2018. Psychology. 24–33. What about mood swings: Identifying depression on twitter with temporal mea- sures of emotions. In Companion Proceedings of the The Web Conference 2018. International World Wide Web Conferences Steering Committee, 1653–1660. [9] Arman Cohan, Bart Desmet, Andrew Yates, Luca Soldaini, Sean MacAvaney, and Nazli Goharian. 2018. SMHD: a large-scale resource for exploring online language usage for multiple mental health conditions. arXiv preprint arXiv:1806.05258 (2018). [10] Arman Cohan, Sydney Young, Andrew Yates, and Nazli Goharian. 2017. Triaging content severity in online mental health forums. Journal of the Association for Information Science and Technology 68, 11 (2017), 2675–2689. [11] Pricewaterhouse Coopers. 2015. The costs of eating disorders: Social, health and economic impacts. B-eat, Norwich (2015). [12] Glen Coppersmith, Mark Dredze, and Craig Harman. 2014. Quantifying men- tal health signals in Twitter. In Proceedings of the workshop on computational linguistics and clinical psychology: From linguistic signal to clinical reality. 51–60. [13] Munmun De Choudhury, Michael Gamon, Scott Counts, and Eric Horvitz. 2013. Predicting depression via social media. In Seventh international AAAI conference on weblogs and social media. [14] Barbara Silveira Fraga, Ana Paula Couto da Silva, and Fabricio Murai. 2018. Online Social Networks in Health Care: A Study of Mental Disorders on Reddit. In 2018 IEEE/WIC/ACM International Conference on Web Intelligence (WI). IEEE, 568–573. [15] Sharath Chandra Guntuku, David B Yaden, Margaret L Kern, Lyle H Ungar, and Johannes C Eichstaedt. 2017. Detecting depression and mental illness on social media: an integrative review. Current Opinion in Behavioral Sciences 18 (2017), 43–49. [16] Michal Kosinski, David Stillwell, and Thore Graepel. 2013. Private traits and attributes are predictable from digital records of human behavior. Proceedings of the National Academy of Sciences 110, 15 (2013), 5802–5805. [17] James Lake and Mason Spain Turner. 2017. Urgent need for improved mental health care and a more collaborative model of care. The Permanente Journal 21 (2017). [18] David E. Losada and Fabio Crestani. 2016. A Test Collection for Research on De- pression and Language use. In Conference Labs of the Evaluation Forum. Springer, 28–39. https://doi.org/10.1007/978-3-319-44564-9_3 [19] David E. Losada, Fabio Crestani, and Javier Parapar. 2019. Overview of eRisk 2019: Early Risk Prediction on the Internet. In Experimental IR Meets Multilin- guality, Multimodality, and Interaction. 10th International Conference of the CLEF Association, CLEF 2019. Springer International Publishing, Lugano, Switzerland. [20] Mary L McHugh. 2012. Interrater reliability: the kappa statistic. Biochemia medica: Biochemia medica 22, 3 (2012), 276–282. [21] Danielle Mowery, Hilary Smith, Tyler Cheney, Greg Stoddard, Glen Coppersmith, Craig Bryan, and Mike Conway. 2017. Understanding depressive symptoms and psychosocial stressors on Twitter: a corpus-based study. Journal of medical Internet research 19, 2 (2017), e48. [22] Philip Resnik, William Armstrong, Leonardo Claudino, Thang Nguyen, Viet-An Nguyen, and Jordan Boyd-Graber. 2015. Beyond LDA: exploring supervised topic modeling for depression-related language in Twitter. In Proceedings of the 2nd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality. 99–107.