<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Depression Detection from Social Media Texts</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Maxim Stankevich</string-name>
          <email>stankevich@isa.ru</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrey Latyshev</string-name>
          <email>andrey.latyshev@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Evgenia Kuminskaya</string-name>
          <email>evgenia.kuminskaya@gmail.com</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ivan Smirnov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Oleg Grigoriev</string-name>
          <email>oleggpolikvart@yandex.ru</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Federal Research Center “Computer Science and Control” of RAS</institution>
          ,
          <addr-line>Moscow</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Limited Liability Company “RI Technologies”</institution>
          ,
          <addr-line>Moscow</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Peoples' Friendship University of Russia (RUDN University)</institution>
          ,
          <addr-line>Moscow</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Psychotherapy and Counselling Psychology FGBNU PI RAE</institution>
          ,
          <addr-line>Moscow</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <fpage>279</fpage>
      <lpage>289</lpage>
      <abstract>
        <p>Nowadays the problem of early depression detection is one of the most important in the field of psychology. Social networks analysis is widely applied to address this problem. In this paper, we consider the task of automatic detection of depression signs from textual messages of Russian social network VKontakte users. We describe the preparation of users' profiles dataset and propose psycholinguistic and stylistic markers of depression in text. We evaluate machine learning methods for detecting signs of depression from social media messages. The results of experiments show that psycholinguistic markers based features achieved 66% of F1-score on the binary classification task which is promising result in comparison with similar works.</p>
      </abstract>
      <kwd-group>
        <kwd>Depression Detection</kwd>
        <kwd>Social Networks</kwd>
        <kwd>Psycholinguistics</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Nowadays the problem of early depression detection is one of the most important in the
field of psychology. Over 350 million people worldwide suffer from depression, which
is about 5% of the total population. Close to 800 000 people die due to suicide every
year and it is statistically the second leading cause of death among people in 15–29
years old [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ]. At the same time, the major number of suicides associated with
depression. Recent researches reveal that depression is also the main cause of disability and a
variety of somatic diseases.
      </p>
      <p>
        For example, F. I. Beliakov [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] in his paper summarizes the main results of recent
depression, anxiety, and stress investigations and their relation to cardiovascular
mortality. His overview shows that increased risk of death from cardiovascular diseases
associates with depression and stress. P. G. Surtees et al. conducted a prospective study
in the UK that based on the 8.5 years of observation [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. This study provides that the
presence of major depression is associated with a 3.5-fold increase in mortality from
coronary heart disease (CHD). W. Whang et al. demonstrate that women with
depression have an increase of fatal CHD by 49% in 9 years of follow-up [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. These studies
demonstrate that depression treatment and stress control, as well as early diagnosis and
prevention of symptoms of psychological distress and mental disorders, can increase
life expectancy.
      </p>
      <p>Nevertheless, depression is still often falsely associated with a lack of willpower and
unwillingness to cope with the “bad mood”. There is social stigmatization of this
disease, and it is embarrassing to admit it for a person. As a result, people with depression
often hide their condition, do not seek help in time, and aggravate the disease.</p>
      <p>Online methods and social media provide an opportunity to privately detect the
symptoms of depression in time. It would allow people to suggest measures for its
prevention and treatment in the early stages. The report of the European branch of WHO
(2016) paid special attention to the identification of signs of depression and the
personalization of online methods of its prevention.</p>
      <p>In this paper, we consider the problem of automatic detection of depression signs
from textual messages of Russian social network Vkontakte users. We explored the
ability of psycholinguistic and stylistic markers to predict depression from the text of
messages. In Section 2, related works are reviewed, in Section 3 we present dataset of
Vkontakte profiles, in Section 4 we describe our methods and feature engineering and
in last sections, we present and discuss results of experiments.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>Instrumental possibilities of analyzing the behavior of users in social networks are
actively developing. In particular, methods of computational linguistics are successfully
used in analyzing the texts from social networks.</p>
      <p>
        The computerized analysis method of texts LIWC (Linguistic Inquiry and Word
Count) [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] allows assessing the extent to which the author of a text uses the words of
psychologically significant categories. The method works on the basis of manually
compiled dictionaries of words that fall into different categories: meaningful words
(social, cognitive, positive/negative words, etc.), functional words (pronouns, articles,
verb forms, etc.). LIWC is used for different languages, including Russian [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], but does
not consider the specifics of the language, since it is simply a translation of dictionaries
from English to Russian.
      </p>
      <p>
        A. Yates et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] used neural network model to reveal the risks of self-harm and
depression based on posts from Reddit and Twitter and showed the high accuracy of
this diagnostic method. The authors indicate that proposed methods can be used for
large-scale studies of mental health as well as for clinical treatment.
      </p>
      <p>
        Seabrook et al. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] utilized the MoodPrism application to collect data about status
updates and mental health of Facebook and Twitter users. It was found that the average
proportion of words expressing positive and negative emotions, as well as their
variability and instability of manifestation in the status of each user, can be used as a simple
but sensitive measure for diagnosing depression in a social network. In addition, it was
found that usefulness of the proposed method may depend on the platform: for
Facebook users these features predicted a greater severity of depression, and lower for
Twitter.
      </p>
      <p>
        M. Al-Mosaiwi et al. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] examined the usage of absolute words (i.e., always, totally,
entire) in text writings from various forums devoted to different disorders: depression,
anxiety, suicidal ideation, posttraumatic stress disorder, eating disorder, etc. It was
found that the number of absolute words in anxiety, depression, and suicidal ideation
related forums was significantly greater than in forums from the control group.
      </p>
      <p>
        Most of the related studies investigate the relationship between mental health and
English-speaking social media texts. As an exception, Panicheva et al. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] and
Bogolyubova et al. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] investigated the relationship between so-called dark triad
(Machiavellianism, narcissism, and psychopathy) and Russian texts from Facebook. Using
the results of the dark triad questionnaire and profile data of Facebook users the authors
conducted a correlation analysis to reveal informative morphological, lexical, and
sentiment features.
      </p>
      <p>
        The study of detecting an early risk of depression based on the experimental task
Clef/eRisk 2017 described in the article [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. The main idea of the task was to classify
Reddit users into two groups: the case of depression and non-risk case. The study
evaluates the applicability of tf-idf, embeddings, and bigrams models with stylometric and
morphological features using Clef/eRisk 2017 dataset and reports 63% of F1-score for
depression class.
      </p>
      <p>It should be noted that the use of computational linguistics for analyzing text
messages of social networks is mainly limited to lexical approaches. The syntactic-semantic
analysis and psycholinguistics markers of the text are still not well evaluated on
depression detection task. In this paper, we applied psycholinguistic markers, dictionaries and
n-grams models to detect depression in social media texts.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Dataset</title>
      <p>
        We asked volunteers from Vkontakte to take part in our psychological research and
complete Beck Depression Inventory questionnaire [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. This questionnaire allows to
calculate depression score on 0–63 scale. Before answering questions, users gave
access to their public pages under privacy constraints via Vkontakte application. We
automatically collected all available information from public personal profile pages using
Vkontakte API for the users who completed questionnaire. Posts, comments,
information about communities, friends etc. were collected from January 2017 to April 2019
for each user. Overall, information from 1020 profiles were assembled to compile our
dataset. All of the personal information that can reveal the identity of persons were
removed from data collection.
      </p>
      <p>The scope of our interest were textual messages, namely posts, written in Russian.
Therefore, we focused on text messages written by Vkontakte users on their personal
profiles and mainly operate with these messages. It is important to note, that social
media data contains significant amount of noise and text volume for each user
considerably vary from person to person. Before performing on depression detection task, we
accurately cleaned the data. First, we applied constrains on required text volume and
number of posts. Secondly, we analyzed scores from Beck Depression Inventory and
divided our users into 2 groups: persons with score less then 11 were annotated as
control group (users without depression signs); persons with score greater than 29 were
annotated as depression group (users with depression signs). In this section, we describe
these steps and provide statistics on the data. We refer to the data before any changings
as initial data, to the data after cleaning as cleaned data, and to the data after depression
risk grouping as pre-classification data.</p>
      <p>The initial data contained information about 1020 persons who took Beck
Depression inventory questionnaire. The distribution of the depression marker across users
from initial data presented in Fig. 1.
The mean age in the initial data is 25. The gender partition is unbalanced: 699 (68.53%)
Females and 321 Males (31.47%). More statistics on the data provided in Table 1. It
can be seen from the Table 1, that initial data is extremely noisy. Standard deviation
values for post, sentence, and word counts are doubled in comparison with their mean
values. It was also discovered that 155 of users from dataset did not provide any textual
volume. The superficial analysis of the data revealed that data require adjustments and
cleaning. As the next step, we performed several actions to adjust the data:
1. Removed all characters which are not alphabet or standard punctuation symbols
from texts using regular expressions;
2. Removed all posts with more than 3000 characters or less than 2 words;
3. Removed all users with less than 10 posts or less than 1000 characters provided;
4. Set 100 as the maximum posts count limit for all users.</p>
      <p>Applying these steps on the initial data yielded 531 user profiles which we annotate as
cleaned data. We can note, that after data adjustments only 32872 users post left from
initial 67.257 posts (see Table 1). We found that limitation of maximum post length is
strongly necessary because the manual observation of the data revealed that the most
of long posts (more than 3000 characters) were usually not authored by users
themselves.</p>
      <p>After adjusting steps, the mean depression score slightly decreased, from what we
can make the assumption, that persons with higher level of depression write less text
than person without depression signs. The gender distribution become even more
unbalanced with 397 females (74.76%) and 134 males (25.23%). The histogram of posts
count demonstrated on Fig. 2.</p>
      <p>
        After the data cleansing stage, we found this text volume much more suitable for
applying natural language processing tools and performing any type of machine
learning based evaluation. Anyhow, the depression scores provided by Beck Depression
Inventory required some interpretation. We outlined 2 different ways how we can design
our research. First one is the regression analysis using raw depression scores, which
might be seen as the most appropriate and confident way. But in the other hand, this
Russian-speaking social network data is novel, and currently there are no studies related
to the depression detection task among Russian-speaking social networks. Most of the
English-speaking social networks based depression tasks were designed as a binary
classification problem: discover if person depressed or not. To make it possible to
compare our results, we decided to perform the similar binary classification task on given
data and compare our results with Clef/eRisk 2017 Shared Task [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ].
      </p>
      <p>As the next step, we observed depression scores and discovered that we cannot
simply divide our data by setting boarder value and annotating all users with depression
score above this value as a risk group of depression and all user with the depression
score bellow boarder value as a non-risk group. In order to form the pre-classification
data, we annotated all persons with depression score less than 11 as non-risk group
(control group). For a risk group we assembled the data of persons with depression
scores above 29 (depression group). These values were discussed and proposed by the
psychologist experts related to our study. The persons with depression score between
these values were removed from observation.</p>
      <p>Performing this step reduced the data population to the 248 users, where 156 were
labeled as control group (without depression signs) and 92 users were labeled as
belonging to the depression group. The general statistics on pre-classification data also
presented in Table 1. The statistics between groups on the pre-classification data
presented in Table 2.</p>
      <p>It can be observed from Table 2 that users from depression group tends to write
lesser amount of text in the Vkontakte social media. The values of average posts count,
average words count, average sentence count are less than in the control groups. The
length of posts and sentences are greater in control group. The gender partition is even
more biased towards female in depression group.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Features and Methods</title>
      <p>
        Before forming the feature sets, all user posts were concatenated into the one text for
every user in dataset. We retrieved four groups of features from texts: morphological,
syntactic, sentiment and psycholinguistic. We applied MyStem [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] for tokenization,
lemmatization, and part-of-speech tagging, and Udpipe [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] for syntax parsing. The
sentiment features were calculated using Linis-Crowd sentiment dictionary [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ].
      </p>
      <p>
        Psycholinguistic markers are linguistic features of text that represent psychological
characteristics of author and may signal about his psychological disorders. For
example, people in stress more frequently use in text pronoun “we” [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. Psycholinguistic
markers are calculated on morphological and syntactic information and in a manner
correspond to the writing style of the author. We use more than 30 markers and the
most significant of them are the following:
─ Mean number of words per sentence;
─ Mean number of characters per word;
─ (N punctuation characters) / (N words);
─ Lexicon: (N unique words) / (N words);
─ Average syntax tree depth;
─ (N verbs) / (N adjectives);
─ (N conjunctions + N prepositions) / (N sentences);
─ (N infinitives) / (N verbs);
─ (N singular first person past tense verbs) / (N verbs);
─ (N first person verbs) / (N verbs);
─ (N third person verbs) / (N verbs);
─ (N first person pronouns) / (N pronouns);
─ (N singular first person pronouns) / (N pronouns);
─ (N plural first person pronouns) / (N pronouns).
      </p>
      <p>
        These psycholinguistic markers were previously utilized for the task of predicting
depression from essay in Russian. They are described in more details at [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. We extend
psycholinguistic markers set with postags ratio and following social network specific
features: uppercase characters ratio, average number of Vkontakte links per post,
number of exclamation marks, number of “sad” and “happy” smiles.
      </p>
      <p>We also formed two n-grams sets: tf-idf matrix computed on the unigrams and tf-idf
matrix computed on the both unigrams and bigrams combined. N-grams that appeared
less than in 1% of texts were removed from the feature sets. User’s lexicon formed
while tf-idf set preparation were extremely poor with 5742 unique tokens for unigrams
and 10909 unique tokens for both unigrams and bigrams combined. We relate this fact
to the specific of social network language. The writings contain a lot of slang and words
with wrong spelling.</p>
      <p>
        Another feature set were retrieved by utilizing dictionaries which was used for the
task of detection verbal aggression in social media writings [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]. It is containing
following dictionaries: negative emotional words, lexis of suffering, positive emotional
words, absolute and intensifying terms, motivation and stressful words, invectives, etc.
To calculate features, for every user we calculate the occurrences of words from
different dictionaries in user’s writings and divide this number on total user’s words count.
      </p>
      <p>As it was mentioned before, we designed depression detection task as binary
classification. We evaluate 4 different sets of features: psycholinguistic markers (PM),
unigrams (UG), bigrams (BG), and dictionaries(D).
5</p>
    </sec>
    <sec id="sec-5">
      <title>Results of Experiments</title>
      <p>
        To perform on the task, we utilized scikit-learn machine learning library [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]. Random
forest and support-vector machines (SVM) models were used to perform evaluation on
the data. All of the feature’s sets were normalized and scaled. Hyperparameters of the
classification algorithms were tuned by grid-search runs.
      </p>
      <p>Set
Set
PM
UG
BG</p>
      <p>D
PM-r
Set
PM
UG
BG</p>
      <p>D
PM-r</p>
      <sec id="sec-5-1">
        <title>Precision</title>
        <p>45.23±2.38</p>
      </sec>
      <sec id="sec-5-2">
        <title>Precision</title>
        <p>59.80±6.21
51.68±9.89
49.64±6.67
46.21±5.52
62.60±7.77</p>
      </sec>
      <sec id="sec-5-3">
        <title>Precision</title>
        <p>55.43±1.99
45.63±7.94
44.38±6.07
55.68±9.49
58.40±2.99</p>
        <p>Since the depression detection task is previously untested on the Russian-speaking
social media data, we also demonstrate the accuracy yielded by random based dummy
classifier. The metrics for evaluation is weighted mean F1-score of both control and
depression group (F1-w) and ROC AUC score. To make it possible to compare our
results with Clef/eRisk 2017 Shared Task results, we also demonstrate precision, recall
and F1-score for depression class only. The evaluation metrics were calculated by
averaging 5 runs of 4-folds cross-validation. The classification results presented in
Table 3.</p>
        <p>The evaluation revealed that Psycholinguistic markers performed well on the data.
We initially assumed that some of the psycholinguistic markers could work poorly on
the data because users usually write very short texts and the volume of concatenated
posts cannot be compared to a logically connected text of the same size. This constrains
are important for the specific of some psycholinguistic markers. We analyzed feature
importance from several Random Forest runs in order to reduce the size of the PM
feature vector which can possibly improve classification performance. The reduced
version of PM (PM-r) was included in classification report.</p>
        <p>The best result on the data yielded by SVM+PM-r model with 75.11% ROC AUC
score, 71.42% weighted F1-score and 66.40% F1-score on depression class. The same
feature set with the Random Forest algorithm also achieved decent results with 74.89%
ROC AUC score and highest precision (62.60%) in our experiments.</p>
        <p>The dictionaries based set demonstrated poor results in comparison with other sets.
In other hand, considering the general complexity of the depression detection task these
dictionaries demonstrated some positive results. These dictionaries should be
redesigned and filtered which can make them useful as additional features for PM set.</p>
        <p>
          The surprising result in our experiments is that n-gram and tf-idf based features did
not perform well on the data. As it was mentioned before, we relate this fact to the great
amount of slang, wrong spelling and another noise in social media language. We should
focus this problem by applying term clustering. For example, we can use words
embeddings as it was implemented in this work [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ].
        </p>
        <p>It should be noted, that we can compare our results with the results of Clef/eRisk
2017 Shared Task evaluation only with some restrictions. First, language of Clef/eRisk
2017 was English, while our data is in Russian. Secondly, the number of data samples
and class ratio is different. Finally, depression class in Clef/eRisk 2017 Shared Task
was assembled by manual expert examination of profiles from subforum related to the
depression disorder. In our study, we operate only with the Beck Depression Inventory
scores.</p>
        <p>
          Despites this facts, best F1-score reported in Clef/eRisk 2017 overview [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ] was
64% achieved by the model that utilized tf-idf based features on the data with LIWC
and dictionary features. In our experiments tf-idf based features demonstrated 57.60%
of F1-score with SVM+BG model. It is important to mention, that current state-of-art
result on Clef/eRisk 2017 data is 73% of F1-score [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ]. The best depression detection
performance on our Vkontakte data is 66% of F1-score achieved by filtered version of
psycholinguistic markers.
6
        </p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>In the study we performed depression detection task among 1020 users of
Russianspeaking social network Vkontakte based on their text messages. By analyzing Beck
Depression Inventory scores and processing the initial data we formed the sample of
248 users’ posts collections with binary depression/control group labeling. We formed
tf-idf and dictionary based feature sets and retrieved novel psycholinguistic features
from users’ writings. The experiments were performed using SVM and Random Forest
classifiers and results were compared with Clef/eRisk 2017 Shared Task evaluation.
The best result in our experiments is 66.40% of F1-score (75.11% of ROC AUC score)
achieved by model that based on filtered psycholinguistic markers.</p>
      <p>It was discovered that psycholinguistic markers performed well on the data and can
be effectively utilized for the depression detection task. We found that Vkontakte
textual data is extremely noisy which is resulted in the relatively poor classification results
achieved by tf-idf based models. We assume that term clustering methods could
improve performance of n-grams models. It is also clear, that dictionaries that we used for
feature set should be redesigned and filtered.</p>
      <p>Thus, the analysis of depression linguistic markers in social network posts is a
promising area that can possibly make the prevention and treatment of depression more
accessible to a large number of users. In the future work we planning to examine neural
network models for the depression detection task and evaluate regression analysis on
the data using Beck Depression Inventory scores.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>This work was financially supported by the Ministry of Education and Science of the
Russian Federation. Grant No. 14.604.21.0194 (Unique Project Identifier
RFMEFI60417X0194).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Turecki</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Brent</surname>
            ,
            <given-names>D.A.</given-names>
          </string-name>
          :
          <article-title>Suicide and suicidal behaviour</article-title>
          .
          <source>The Lancet</source>
          <volume>387</volume>
          (
          <issue>10024</issue>
          ),
          <fpage>1227</fpage>
          -
          <lpage>1239</lpage>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2. World Health Organization. https://www.who.int/mental_health/prevention/ suicide/suicideprevent/en/,
          <source>last accessed</source>
          <year>2019</year>
          /08/19
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Belialov</surname>
            ,
            <given-names>F.I.</given-names>
          </string-name>
          :
          <article-title>Depression, anxiety, stress, and mortality</article-title>
          .
          <source>Terapevticheskii arkhiv</source>
          <volume>88</volume>
          (
          <issue>12</issue>
          ),
          <fpage>116</fpage>
          -
          <lpage>119</lpage>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Surtees</surname>
            ,
            <given-names>P.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wainwright</surname>
            ,
            <given-names>N.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Luben</surname>
            ,
            <given-names>R.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wareham</surname>
            ,
            <given-names>N.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bingham</surname>
            ,
            <given-names>S.A.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Khaw</surname>
          </string-name>
          , K.T.:
          <article-title>Depression and ischemic heart disease mortality: evidence from the EPICNorfolk United Kingdom prospective cohort study</article-title>
          .
          <source>American Journal of Psychiatry</source>
          <volume>165</volume>
          (
          <issue>4</issue>
          ),
          <fpage>515</fpage>
          -
          <lpage>523</lpage>
          (
          <year>2008</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Whang</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kubzansky</surname>
            ,
            <given-names>L.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kawachi</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rexrode</surname>
            ,
            <given-names>K.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kroenke</surname>
            ,
            <given-names>C.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Glynn</surname>
            ,
            <given-names>R.J.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Albert</surname>
            ,
            <given-names>C.M.</given-names>
          </string-name>
          :
          <article-title>Depression and risk of sudden cardiac death and coronary heart disease in women: results from the Nurses' Health Study</article-title>
          .
          <source>Journal of the American College of Cardiology</source>
          <volume>53</volume>
          (
          <issue>11</issue>
          ),
          <fpage>950</fpage>
          -
          <lpage>958</lpage>
          (
          <year>2009</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Tausczik</surname>
            ,
            <given-names>Y.R.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Pennebaker</surname>
            ,
            <given-names>J.W.:</given-names>
          </string-name>
          <article-title>The psychological meaning of words: LIWC and computerized text analysis methods</article-title>
          .
          <source>Journal of language and social psychology 29 (1)</source>
          ,
          <fpage>24</fpage>
          -
          <lpage>54</lpage>
          (
          <year>2010</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Kailer</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Chung</surname>
            ,
            <given-names>C.K.</given-names>
          </string-name>
          :
          <article-title>The Russian LIWC2007 dictionary</article-title>
          . Austin, TX: LIWC.net (
          <year>2011</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Yates</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cohan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Goharian</surname>
          </string-name>
          , N.:
          <article-title>Depression and self-harm risk assessment in online forums</article-title>
          .
          <source>arXiv preprint arXiv:1709</source>
          .
          <year>01848</year>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Seabrook</surname>
            ,
            <given-names>E.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kern</surname>
            ,
            <given-names>M.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fulcher</surname>
            ,
            <given-names>B.D.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Rickard</surname>
            ,
            <given-names>N.S.</given-names>
          </string-name>
          :
          <article-title>Predicting depression from language-based emotion dynamics: longitudinal analysis of Facebook and Twitter status updates</article-title>
          .
          <source>Journal of Medical Internet Research</source>
          <volume>20</volume>
          (
          <issue>5</issue>
          ),
          <year>e168</year>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Al-Mosaiwi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Johnstone</surname>
          </string-name>
          , T.:
          <article-title>In an absolute state: Elevated use of absolutist words is a marker specific to anxiety, depression, and suicidal ideation</article-title>
          .
          <source>Clinical Psychological Science</source>
          <volume>6</volume>
          (
          <issue>4</issue>
          ),
          <fpage>529</fpage>
          -
          <lpage>542</lpage>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Panicheva</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ledovaya</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Bogolyubova</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          :
          <article-title>Lexical, morphological and semantic correlates of the dark triad personality traits in russian facebook texts</article-title>
          .
          <source>In 2016 IEEE Artificial Intelligence and Natural Language Conference (AINL)</source>
          (pp.
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          ). IEEE (
          <year>2016</year>
          , November).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Bogolyubova</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Panicheva</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tikhonov</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ivanov</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Ledovaya</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Dark personalities on Facebook: Harmful online behaviors and</article-title>
          .
          <source>Computers in Human Behavior</source>
          <volume>78</volume>
          ,
          <issue>151e159</issue>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Stankevich</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Isakov</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Devyatkin</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Smirnov</surname>
          </string-name>
          , I.:
          <article-title>Feature Engineering for Depression Detection in Social Media</article-title>
          .
          <source>In ICPRAM</source>
          ,
          <fpage>426</fpage>
          -
          <lpage>431</lpage>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Beck</surname>
            ,
            <given-names>A.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Steer</surname>
            ,
            <given-names>R.A.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Brown</surname>
          </string-name>
          , G.K.
          <source>Beck depression inventory-II. San Antonio</source>
          <volume>78</volume>
          (
          <issue>2</issue>
          ),
          <fpage>490</fpage>
          -
          <lpage>498</lpage>
          (
          <year>1996</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Losada</surname>
            ,
            <given-names>D.E.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Crestani</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>A test collection for research on depression and language use</article-title>
          .
          <source>In International Conference of the Cross-Language Evaluation Forum for European Languages</source>
          ,
          <fpage>28</fpage>
          -
          <lpage>39</lpage>
          . Springer, Cham (
          <year>2016</year>
          , September).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16. MyStem Homepage, https://tech.yandex.ru/mystem, last accessed
          <year>2019</year>
          /08/19
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Straka</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Straková</surname>
          </string-name>
          , J. Tokenizing,
          <article-title>pos tagging, lemmatizing and parsing ud 2.0 with udpipe</article-title>
          .
          <source>In Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw</source>
          Text to Universal Dependencies,
          <fpage>88</fpage>
          -
          <lpage>99</lpage>
          (
          <year>2017</year>
          ,
          <year>August</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Koltsova</surname>
            ,
            <given-names>O.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alexeeva</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Kolcov</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>An opinion word lexicon and a training dataset for russian sentiment analysis of social media</article-title>
          .
          <source>Computational Linguistics and Intellectual Technologies: Materials of DIALOGUE 2016 (Moscow)</source>
          ,
          <fpage>277</fpage>
          -
          <lpage>287</lpage>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Pennebaker</surname>
            ,
            <given-names>J.W.</given-names>
          </string-name>
          <article-title>The secret life of pronouns</article-title>
          .
          <source>New Scientist</source>
          <volume>211</volume>
          (
          <issue>2828</issue>
          ),
          <fpage>42</fpage>
          -
          <lpage>45</lpage>
          (
          <year>2011</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Stankevich</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smirnov</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuznetsova</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kiselnikova</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Enikolopov</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Predicting Depression from Essays in Russian. Computational Linguistics and Intellectual Technologies</article-title>
          , DIALOGUE
          <volume>18</volume>
          ,
          <fpage>637</fpage>
          -
          <lpage>647</lpage>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Devyatkin</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuznetsova</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chudova</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Shvets</surname>
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Intellectual analysis of the manifestations of verbal aggressiveness in the texts of network communities [Intellektuanyj analiz proyavlenij verbalnoj agressivnosti v tekstah setevyh soobshchestv]</article-title>
          .
          <source>Artificial Intelligence and Decision Making, (2)</source>
          ,
          <fpage>27</fpage>
          -
          <lpage>41</lpage>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Pedregosa</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Varoquaux</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gramfort</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Michel</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thirion</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grisel</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Vanderplas</surname>
          </string-name>
          , J.:
          <article-title>Scikit-learn: Machine learning in Python</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          , (
          <volume>12</volume>
          ),
          <fpage>2825</fpage>
          -
          <lpage>2830</lpage>
          (
          <year>2011</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Losada</surname>
            ,
            <given-names>D.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Crestani</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Parapar</surname>
          </string-name>
          , J.:
          <article-title>CLEF 2017 eRisk Overview: Early Risk Prediction on the Internet: Experimental Foundations</article-title>
          .
          <source>In CLEF (Working Notes)</source>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Trotzek</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koitka</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Friedrich</surname>
            ,
            <given-names>C.M.</given-names>
          </string-name>
          :
          <article-title>Utilizing neural networks and linguistic metadata for early detection of depression indications in text sequences</article-title>
          .
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>