Depression Diagnosis using Text-based AI Methods –
A Systematic Review
Martín Di Felice* , Parag Chatterjee and María F. Pollo-Cattaneo
Universidad Tecnológica Nacional Facultad Regional Buenos Aires, Buenos Aires, Argentina


                                      Abstract
                                      Recent years have seen increasing use of artificial intelligence in the domain of healthcare and mental
                                      health is no exception. This study is focused on the particular aspect of depression, which affects a
                                      significant percentage of the population and is an important concern globally. This systematic review
                                      analyzes different methods based on artificial intelligence to diagnose depression, highlighting the global
                                      trends of this domain like the huge share of natural language processing algorithms and neural networks
                                      on one hand, and illustrating the key issues and future lines of research in applying artificial intelligence
                                      in the domain of mental health, on the other hand.

                                      Keywords
                                      Machine Learning, Pattern Recognition, Mental Health, Depression


1. Introduction
Artificial Intelligence (AI) is one of the branches of the computer sciences that is in charge of
solving complex, nonlinear problems that usually need human interaction. AI seeks to emulate
human behavior in order to automate tasks in such a way they can be solved with a similar
efficiency but faster [1].
   In order to be able to emulate that human behavior, AI algorithms use big sets of data, called
datasets [2]. While bigger, more complete, and more heterogeneous are these datasets, these
algorithms may infer better the relationship between the data so they can generate rules that
before the appearance of new data they can predict how they are going to behave [3, 4].
   Due to the increase in computational power and the availability of more data, AI has increased
its involvement in different fields during the last years [5]. One of the domains where it has even
more involvement is healthcare [6]. It has increased its participation in the area of mental health
but at a lower rate [7]. Although the ethical aspects of its use are still in debate [8], the benefits
that come from its application seem quite promising [6, 9], including the speed of diagnosis and
the fact of eliminating the expert subjectivity and replacing it with a science-based objective
method.

ICAIW 2022: Workshops at the 5th International Conference on Applied Informatics 2022, October 27–29, 2022, Arequipa,
Peru
*
  Corresponding author
$ mdifelice@frba.utn.edu.ar (M. Di Felice); parag@frba.utn.edu.ar (P. Chatterjee); flo.pollo@gmail.com
(M. F. Pollo-Cattaneo)
 0000-0003-1388-3220 (M. Di Felice); 0000-0001-6760-4704 (P. Chatterjee); 0000-0003-4197-3880
(M. F. Pollo-Cattaneo)
                                    © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
 CEUR
 Workshop
 Proceedings
               http://ceur-ws.org
               ISSN 1613-0073
                                    CEUR Workshop Proceedings (CEUR-WS.org)


                                                                                                       11
Martín Di Felice et al. CEUR Workshop Proceedings                                               11–27


   Although there exist objective and parameterized techniques for the diagnosis of mental
health issues, and, in particular, for depression [10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20], the
use of AI methods not only offers the process of making the diagnosis, transforming data
corresponding to symptoms into an output corresponding to a disease, but also helps to find
those symptoms by transforming colloquial expressions into objective symptoms, and discover
relationships between different types of symptoms as well.
   Depression is a mental illness that is characterized by producing mood disorders over long
periods of time, which can be for several weeks or more [21, 22]. It affects a significant percentage
of the population [23], people of any age and it can be grouped into two large groups: major
depressive disorder and persistent depressive disorder. There exist other forms that are also
common but they happen with less frequency, such as postpartum depression, premenstrual
dysphoric disorder, seasonal affective disorder, and psychotic disorder.
   The usage of AI methods to diagnose depression using the analysis of data generated by
patients have a high degree of effectiveness according to the present research.
   This work intends to obtain a state of the art about the usage of AI methods to diagnose
depression using text datasets. In order to define the state of the art, a Systematic Mapping Study
(SMS) is made. The SMS is a standardized research process whose goal is to gather existent
evidence about a particular topic [24] searching studies about it and summarizing them in order
to obtain a conclusion.


2. Goals
The goal of the present study is to identify which research lines are open on the usage of AI
techniques for depression diagnosis using text datasets, so eventually develop a new method
that allows effectively diagnosing with the purpose of helping with the disease treatment.
   An SMS is made following the process proposed by Petersen et al.[24] in order to determine
the state of the art on this subject. This process establishes as the first step the definition of the
questions that will lead the investigation. The research questions are the following:
     • RQ1: Which AI method(s) are used to solve the problem?
     • RQ2: What kind of learning is used to adjust the solution?
     • RQ3: Which results are obtained after applying each method?
     • RQ4: How are the results validated by each method?
     • RQ5: What are the future open research lines?

  In order to answer each one of these questions set as research goals for the present study, a
systematic review related to depression diagnosis using text-based AI methods was performed.
The following sources were used:
     • IEEE Xplore1
     • PubMed2
     • Scopus3
1
  https://ieeexplore.ieee.org/Xplore/home.jsp
2
  https://pubmed.ncbi.nlm.nih.gov/
3
  https://www.scopus.com/home.uri


                                                 12
Martín Di Felice et al. CEUR Workshop Proceedings                                             11–27


3. Results
A search and synthesis tool has been developed4 in order to automate and standardize the
search on these sources. The tool connects to each one of those sources using their public API
and performs the search using the following terms:
("artificial intelligence" OR "machine learning" OR "deep learning")
AND ("depression diagnos*" OR "depression detection" OR "depression
estimation")
  Then, a template was built and used as the foundation for the extraction formulary where
each study appears along with their metadata: name, authors, published date, and link (wherever
applicable). In order to perform a specific selection of the studies to be included in this review,
the SMS establishes the definition of inclusion and exclusion criteria. The inclusion criteria are
the following:

       • Magazine articles, conference articles, or book chapters.
       • Published year equal to or greater than 2012.
       • Studies must use AI to solve the problem.
       • Studies must diagnose depression.

      While the exclusion criteria are:

       • Studies must not perform prognosis nor predictions.
       • Studies must not use non-text-based datasets.
       • Studies must not be written in any other language than English.

   The search was performed using the tool, meeting all the inclusion criteria and exclusion
criteria, The initial search found a total of 192 articles and after applying the exclusion criteria,
45 articles were obtained. Then, for each article, the present work has been designed, attempting
to answer (Table 1) the research questions defined above.

    Table 1
    Extraction formulary
    Study               RQ1            RQ2 RQ3                        RQ4         RQ5
                                           0.63 (P), 0.57 (R), 0.6    Self-
    Rao et al. [25]     NLP, NN        SL                                         Extend model
                                           (F1)                       Informed
                                            DT: 0.69 (AUC) KNN:
    McGinnis et         DT, KNN, LR.
                                     SL     0.73 (AUC) SVM: 0.76 Experts          -
    al. [26]            SVM
                                            (AUC) LR: 0.8 (AUC)
    Cong et al.                             0.69 (P), 0.53 (R), 0.6   Self-
                NLP, NN                SL                                         -
    [27]                                    (F1)                      Informed


4
    https://github.com/mdifelice/hbs


                                                  13
Martín Di Felice et al. CEUR Workshop Proceedings                                      11–27


                                                                           Increase dataset,
                                                                           extend to other
 Gerych et al.                                                   Question-
               NN, SVM             UL    0.92 (AUC), 0.91 (F1)             diseases, and ex-
 [28]                                                            naires
                                                                           tend to general
                                                                           population
                                         NB: 0.84 (P), 0.83 (R),
 Deshpande         NB,     NLP,                                           Improve valida-
                                   SL    0.83 (F1) SVM: 0.8 (P), Keywords
 and Rao [29]      SVM                                                    tion
                                         0.79 (R), 0.8 (F1)
 Wang et al.                             0.97 (ACC), 0.97 (F1),
             NLP, NN               SL                           Experts     Increase dataset
 [30]                                    0.99 (P), 0.95 (R)
                LR, NB, NLP,             NN: 0.98 (ACC), 0.98
 Malviya et al.
                NN,      RF, SL          (F1) SVM: 0.92 (ACC), Keywords Increase dataset
 [31]
                SVM, XGB                 0.92 (F1)
                                         KNN: 0.79 (ACC), 0.6
                                         (R), 0.72 (P), 0.65 (F1)
                                         LR: 0.77 (ACC), 0.5 (R),
 Hassan et al. DT, KNN, LR,              0.39 (P), 0.44 (F1) SVM: Question-
                            SL                                              Tool creation
 [32]          NB, SVM                   0.77 (ACC), 0.5 (R), 0.39 naires
                                         (P), 0.44 (F1) NB: 0.77
                                         (ACC), 0.5 (R), 0.39 (P),
                                         0.44 (F1)
                                         KNN+LR+SVM: 0.9
 Kumar et al. DT, KNN, LR,                                Self-
                           SL            (ACC) DT+NB+SVM:                   -
 [33]         NB, SVM                                     Informed
                                         0.88 (ACC)
 Uddin et al.
              NLP, NN              SL    0.86 (ACC)              Experts    -
 [34]
               DT,   KNN,
 Victor et al.
               NB, NLP, RF, SL           0.9 (ACC)               Experts    Increase dataset
 [35]
               SVM
                                         AB+BP+GB+RF:
               AB, BP, DT,
 Chiong et al.                           0.98       (ACC)                   Improve valida-
               GB, LR, NLP, SL                                   Keywords
 [36]                                    DT+LR+NN+SVM:                      tion
               NN, RF, SVM
                                         0.96 (ACC)
                                                                 Experts
 Arun et al.
             XGB                   SL    0.98 (ACC)              Question- -
 [37]
                                                                 naires
                                         AB: 0.98 (ACC), NN:
 Raihan et al.
               AB, NN, RF          SL    0.83 (ACC), RF: 0.72 Experts       Increase dataset
 [38]
                                         (ACC)


                                                14
Martín Di Felice et al. CEUR Workshop Proceedings                                          11–27


 Govindasamy
 and                                                                 Sentiment Determine
                    DT, NB, NLP    SL    0.97 (ACC)
 Palanichamy                                                         Analysis  depression level
 [39]
 Al Asad e t al. NB,       NLP,          0.74 (ACC), 1 (P), 0.6      Question- Include other
                                   SL
 [40]            SVM                     (R)                         naires    languages
                                         AB: 0.79 (ACC), 0.81
                                         (F1), 0.72 (P), 0.93 (R)
                                         LR: 0.89 (ACC), 0.89
                                         (F1), 0.89 (P), 0.92 (R)
                AB, LR, RF,                                                   Study relation-
 Tadesse et al.                          NN: 0.91 (ACC), 0.93
                NLP,   NN, SL                                        Keywords ship with per-
 [41]                                    (F1), 0.9 (P), 0.92 (R)
                SVM                                                           sonality
                                         RF: 0.85 (ACC), 0.85
                                         (F1), 0.83 (P), 0.87 (R)
                                         SVM: 0.9 (ACC), 0.91
                                         (F1), 0.89 (P), 0.93 (R)
 Shah et      al.                                                    Self-     Improve perfor-
                    NLP, NN        SL    0.81 (F1)
 [42]                                                                Informed mance
 Bhat et      al.                                                    Sentiment
                    NLP, NN        SL    0.98 (ACC)                            -
 [43]                                                                Analysis
                                         KNN: 0.95 (F1), 0.96 (P),
                                         0.95 (R) RF: 0.86 (F1),
 Santana et al. GA, KNN,                                           Question-
                         SL              0.85 (P), 0.84 (R) SVM:             -
 [44]           RF, SVM                                            naires
                                         0.93 (F1), 0.93 (P), 0.93
                                         (R)
                                         DT: 0.47 (ACC), 0.19
                                         (P), 0.74 (R) KNN: 0.96
                                         (ACC), 0.86 (P), 0.92 (R)
                    DT,     KNN,         LR: 0.59 (ACC), 0.20
 Opuku Asare                                                         Question-
                    LR, RF, SVM, SL      (P), 0.58 (R) RF: 0.98                Increase dataset
 et al. [45]                                                         naires
                    XGB                  (ACC), 0.93 (P), 0.94 (R)
                                         SVM: 0.86 (ACC), 0.52
                                         (P), 0.81 (R) XGB: 0.98
                                         (ACC), 0.93 (P), 0.96 (R)
                                                                     Keywords
 Hemmatirad                              0.96 (F1), 0.97 (P), 0.96            Add more mod-
                    NLP, SVM       SL                                Self-
 et al. [46]                             (R), 0.95 (ACC)                      els
                                                                     Informed
                                                                              Improve     val-
 Zogan et al.                            0.9 (ACC), 0.9 (P), 0.89
              NLP, NN              SL                                Keywords idation     and
 [47]                                    (R), 0.89 (F1)
                                                                              increase dataset


                                                15
Martín Di Felice et al. CEUR Workshop Proceedings                                         11–27


                                         XGB: 0.95 (ACC), 0.85
                                         (P), 0.99 (SPC), 0.48 (R)
                                         RF: 0.95 (ACC), 0.99 (P),
 Haque et al. DT, NB, RF,                1 (SPC), 0.44 (R) DT:
                          SL                                       Experts     -
 [48]         XGB                        0.95 (ACC), 0.94 (P), 1
                                         (SPC), 0.45 (R) NB: 0.94
                                         (ACC), 0.69 (P), 0.98
                                         (SPC), 0.51 (R)
                                         DT: 0.82 (ACC), 0.83
                                         (P), 0.84 (R), 0.84 (F1)
                                         LR: 0.93 (ACC), 0.93
               AB, BP, DT,                                                   Include    un-
 Chiong et al.                           (P), 0.72 (R), 0.81 (F1)
               GB, LR, NLP, SL                                      Keywords supervised
 [49]                                    NN: 0.85 (ACC), 0.87
               NN, RF, SVM                                                   learning
                                         (P), 0.86 (R), 0.86 (F1)
                                         SVM: 0.87 (ACC), 0.9
                                         (P), 0.87 (R), 0.88 (F1)
 Narziev et al.                                                     Question-
                RF, SVM            SL    0.96 (ACC)                           Increase dataset
 [50]                                                               naires
                                         DT: 0.71 (ACC) KNN:
 Islam et al. DT, EL, KNN,
                           SL            0.6 (ACC) SVM: 0.71 Keywords Increase dataset
 [51]         NLP, SVM
                                         (ACC) EL: 0.64 (ACC)
 Zogan et al.                      SL,   0.91 (P), 0.9 (R), 0.91
              NLP, NN                                               Keywords Increase dataset
 [52]                              UL    (F1), 0.9 (ACC)
                                         0.79 (ACC), 0.81 (P), Question-
 Xu et al. [53]    ES              SL                                    Tool creation
                                         0.85 (R), 0.83 (F1)   naires
                                         KM: 0.63 (P), 0.61 (R),
                                         0.51 (F1) GMM: 0.64
                 DBSCAN,                 (P), 0.64 (R), 0.64 (F1)
 Shrestha et al. GMM,    IF,             DBSCAN: 0.77 (P), 0.42
                             UL                                   Experts      Increase dataset
 [54]            KM,  NLP,               (R), 0.27 (F1) IF: 0.62
                 SVM                     (P), 0.62 (R), 0.54 (F1)
                                         SVM: 0.6 (P), 0.59 (R),
                                         0.59 (F1)


                                                16
Martín Di Felice et al. CEUR Workshop Proceedings                                         11–27


                                         DT: 0.78 (ACC), 0.59
                                         (R), 0.62 (F1), 0.78
                                         (P), 0.6 (AUC) NB:
                                         0.8 (ACC), 0.81 (R),
 Alsagri and        DT, NB, NLP,                                       Add more mod-
                                 SL      0.72 (F1), 0.65 (P), Keywords
 Ykhlef [55]        SVM                                                els
                                         0.67 (AUC) SVM: 0.83
                                         (ACC), 0.85 (R), 0.79
                                         (F1), 0.74 (P), 0.78
                                         (AUC)
                                                                    Experts
 Xezonaki et al. NLP,         NN,
                                    SL   0.72 (F1)                  Question- Tool creation
 [56]            SVM
                                                                    naires
                                         LR: 0.51 (F1), 0.38 (P),
 Ramiandrisoa
                                         0.8 (R) LR+RF: 0.51 (F1), Self-
 and Mothe          LR, NLP, RF     SL                                         Increase dataset
                                         0.38 (P), 0.8 (R) RF: 0.58 Informed
 [57]
                                         (F1), 0.69 (P), 0.51 (R)
 Burdisso et al.                         0.61 (F1), 0.63 (P), 0.6   Self-      Extend to other
                 ES, NLP            SL
 [58]                                    (R)                        Informed   diseases
                                         SVM: 0.58 (P), 0.77 (R),
 Stankevich et      NLP,      RF,                                 Question- Add more mod-
                                    SL   0.66 (F1) RF: 0.63 (P),
 al. [59]           SVM                                           naires    els
                                         0.53 (R), 0.6 (F1)
                                                            Experts
 Choi    et   al.                   SSL, UL: 3.105 (ANOVA),
                    GMM, LR                                 Question- Increase dataset
 [60]                               UL 2.732 (ANOVA)
                                                            naires
                                         0.96 (ACC), 0.99 (P),
 Khan et al.                                                       Sentiment
             NLP, NN                SL   0.95 (F1), 0.93 (R), 0.98           Tool creation
 [61]                                                              Analysis
                                         (SPC)
                                         RF: 0.78 (ACC), 0.78
                                         (F1), 0.85 (AUC) LR:
 Zhang et al. LR, NLP, RF,               0.78 (ACC), 0.79 (F1),
                           SL                                   Experts        -
 [62]         SVM                        0.86 (AUC) SVM: 0.79
                                         (ACC), 0.79 (F1), 0.86
                                         (AUC)
                                         0.91 (ACC), 0.92 (P),                 Extend to other
 Ren et al. [63]    NLP, NN         SL                         Experts
                                         0.96 (R), 0.94 (F1)                   diseases
 Amanat et al.                           0.98 (P), 0.99 (R), 0.98              Extend to other
               NLP, NN              SL                              Experts
 [64]                                    (F1)                                  diseases


                                                17
Martín Di Felice et al. CEUR Workshop Proceedings                                        11–27


                                         Negatives: 0.98 (P),
                                         0.84 (R), 0.85 (F1) Posi-
 Almars [65]         NLP, NN       SL                               Experts   Increase dataset
                                         tives: 0.78 (P), 0.83 (R),
                                         0.81 (F1)
                                         HAN: 0.33 (average
                                         hit rate), 0.66 (aver-
                                         age closeness rate)
                                         BERT: 0.79 (average
 Inkpen at al.                                                  Self-         Improve perfor-
               NLP, NN             SL    difference between
 [66]                                                           Informed      mance
                                         overall    depression
                                         levels) RoBERTa: 0.3
                                         (depression category
                                         hit rate)
                                                                              Improve perfor-
                                                                              mance, extend
                                         0.83 (P), 0.71 (R), 0.77   Question-
 Wu et al. [67]      NLP, NN       SL                                         to other dis-
                                         (F1)                       naires
                                                                              eases, and tool
                                                                              creation
                                         NB: 0.8 (ACC), 0.61
                                         (P), 0.4 (R), 0.48 (F1)
                                         DT: 0.8 (ACC), 0.58 (P),
                     NB, NLP, DT,        0.53 (R), 0.55 (F1) SVM:
 Shah    et    al.
                     SVM, SGD, SL        0.77 (ACC) SGD: 0.79 Experts         -
 [68]
                     RF                  (ACC), 0.55 (P), 0.54
                                         (R), 0.54 (F1) RF: 0.83
                                         (ACC), 0.77 (P), 0.4 (R),
                                         0.53 (F1)
                                         RF: 0.74 (AUC), 0.59
                                         (P), 0.71 (R), 0.65 (F1)
                                         AB: 0.72 (AUC), 0.57
                     AB, GB, LR,         (P), 0.68 (R), 0.62 (F1)
 Stankevich et       NB,   NLP,          LR: 0.69 (AUC), 0.51       Question- Add more mod-
                                 SL
 al. [69]            NN,     RF,         (P), 0.68 (R), 0.58 (F1)   naires    els
                     SVM, XGB            XGB: 0.68 (AUC), 0.48
                                         (P), 0.74 (R), 0.58 (F1)
                                         SVM: 0.67 (AUC), 0.44
                                         (P), 0.84 (R), 0.58 (F1)

  AB: AdaBoost, ACC: Accuracy, ANOVA: Analysis of Variance, AUC: Area Under the Curve, BP:
Bagging Predictors, DBSCAN: Density-Based Spatial Clustering of Applications with Noise, DT:
Decision Trees, EL: Ensembled Learning, ES: Expert System, F1: F-Score, GA: Genetic Algorithms,
GB: GradientBoost, GMM: Gaussian Mixture Model, IF: Isolation Forest, KM: K-Means, KNN: K-


                                                18
Martín Di Felice et al. CEUR Workshop Proceedings                                                11–27


Nearest Neighbors, LR: Logistic Regression, NB: Naive Bayes, NLP: Natural Language Processing,
NN: Neural Networks, P: Precision, R: Recall, RF: Random Forest, SGD: Stochastic Gradient Descent,
SL: Supervised Learning, SPC: Specificity, SSL: Semi-Supervised Learning, SVM: Support Vector
Machines, UL: Unsupervised Learning, XGB: XGBoost.

3.1. RQ1: Which AI method or methods are used to solve the problem?
Most of the studies use more than one type of algorithm, with Neural Networks (NN) (14.2%),
Support Vector Machines (14.2%), and Natural Language Processing (NLP) (20.4%) having the
highest share. Rao et al. [25], Cong et al. [27], Wang et al. [30], Uddin et al. [34], Shah et al. [42],
Bhat et al. [43], Zogan et al. [47], Zogan et al. [52], Khan et al. [61], Ren et al. [63], Amanat et
al. [64], Almars [65], Inkpen et al. [66] and Wu et al. [67] use the combination of NLP and NN
algorithms on their methods. Deshpande and Rao [29], Malviya et al. [31], Victor et al. [35],
Chiong et al. [36], Govindasamy and Palanichamy [39], Al Asad et al. [40], Tadesse et al. [41],
Hemmatirad et al. [46], Chiong et al. [49], Islam et al. [51], Shrestha et al. [54], Xezonaki et al.
[56], Ramiandrisoa and Mothe [57], Burdisso et al. [58], Stankevich et al. [59], Zhang et al. [62],
Shah et al. [68] and Stankevich et al. [69] on the other hand, use NLP along with other kind of
algorithms (including NN but not exclusively). Finally, Gerych et al. [28] and Raihan et al. [38]
used NN combined with other kinds of algorithms but not NLP. This way, 34 of the 45 primary
studies (a 75.6%) used NN or NLP algorithms.
   From the rest of the studies, the most observed combinations are Decision Trees (DT), K-
Nearest Neighbors (KNN), Logistic Regression (LR) and Support Vector Machines (SVM) in
McGinnis et al. [26], Hassan et al. [32], Kumar et al. [33][31], Victor et al. [35] and Opuku
Asare et al. [45]. Figure 1 shows the algorithm type distribution and Figure 2 shows the NLP
and NN prevalence.


Figure 1: Algorithm Types


                                                  19
Martín Di Felice et al. CEUR Workshop Proceedings                                              11–27


Figure 2: NLP and NN prevalence


3.2. RQ2: What kind of learning is used to adjust the solution?
An important tendency towards the use of supervised learning has been observed (91.1%). Only
Gerych et al. [28] and Shresta et al. [54] have chosen to investigate methods based on unsuper-
vised learning. Zogan et al. [52] and Choi et al. [60] use hybrid methods combining supervised
and unsupervised learning, and unsupervised and semi-supervised learning respectively.

3.3. RQ3: Which results are obtained after applying each method?
Both the used metrics and the obtained results vary from one study to another. Rao et al. [25],
McGinnis et al. [26], Deshpande and Rao [29], Malviya et al. [31], Hassan et al. [32], Kumar et
al. [33], Victor et al. [35], Chiong et al. [36], Raihan et al. [38], Tadesse et al. [41], Bhat et al.
[43], Santana et al. [44], Opuku Asare et al. [45], Haque et al. [48], Islam et al. [51], Shrestha
et al. [54], Alsagri and Ykhlef [55], Ramiandrisoa and Mothe [57], Stankevich et al. [59], Khan
et al. [61], Zhang et al. [62], Shah et al. [68], and Stankevich et al. [69] use different models
and compare them to see which ones work better. Govindasamy and Palanichamy [39] and
Xezonaki et al. [56] also illustrate multiple results, but comparing the same model with different
datasets; and Chiong et al. [49] use several models on two different datasets.
   The most used metrics are accuracy (25.1%), recall (22.9%), precision (22.3%), and F1 (21.3%).
Figure 3 shows the most used metrics.
   The accuracy rises from 0.47 to 0.98, with an average of 0.86 and a median of 0.9; recall goes
from 0.33 to 0.99, with an average of 0.75 and a median of 0.79; precision goes from 0.19 to 1,
with an average of 0.76 and a median of 0.83; and finally, F1 goes from 0.27 to 0.98, with an
average of 0.82 and a median of 0.85.

3.4. RQ4: How are the results validated by each method?
Primarily, five different validation types were identified. McGinnis et al. [26], Wang et al. [30],
Uddin et al. [34], Victor et al. [35], Raihan et al. [38], Haque et al. [48], Shrestha et al. [54],


                                                 20
Martín Di Felice et al. CEUR Workshop Proceedings                                              11–27


Figure 3: Metrics


Xezonaki et al. [56], Zhang et al. [62], Ren et al. [63], Amanat et al. [64], Almars [65], and Shah
et al. [68] use the analysis of experts to determine if a record belongs to a depression patient or
not. This is the most used validation type with 30.6% of the cases.
   On the other hand, the second most used validation type is the use of questionnaires. Gerych
et al. [28], Hassan et al. [32], Al Asad et al. [40], Santana et al. [44], Opoku Asare et al. [45],
Narziev et al. [50], Xu et al. [53], Stankevich et al. [59], Wu et al. [67] and Stankevich et al. [69]
use this kind of validation (26.5% of the total).
   In third place, with the 20.4% of the cases, Deshpande and Rao [29], Malviya et al. [31],
Chiong et al. [36], Tadesse et al. [41], Hemmatirad et al. [46], Zogan et al. [47], Chiong et al.
[49], Islam et al. [51], Zogan et al. [52] and Alsagri and Ykhlef [55] search for keywords inside
the datasets to determine if a patient if depressive or not.
   Also, some studies (16.3%) use datasets where the labeling is made by the participants
themselves (self-informed). Rao et al. [25], Cong et al. [27], Kumar et al. [33], Shah et al. [42],
Ramiandrisoa and Mothe [57], Burdisso et al. [58] and Inkpen et al. [66] use these datasets.
   And, in the last instance, with 6.1% of the distribution, Govindasamy and Palanichamy [39],
Bhat et al. [43] and Khan et al. [61] use sentiment analysis to label their datasets. Figure 4
shows the validation types.

3.5. RQ5: What are the future open research lines?
Among the studies that mention future work (82.2%), the most mentioned one is increasing the
dataset or datasets used for the experiment. Gerych et al. [28], Wang et al. [30], Malviya et al.
[31], Victor et al. [35], Raihan et al. [38], Opuku Asare et al. [45], Zogan et al. [47], Narziev et
al. [50], Islam et al. [51], Zogan et al. [52], Shrestha et al. [54], Ramiandrisoa and Mothe [57],
Choi et al. [60] and Almars [65] mention this possibility.
   Gerych et al. [28], Burdisso et al. [58], Ren et al. [63], Amanat et al. [64] and Wu et al.
[67] indicate that in the future they would be willing to extend their models to diagnose other
diseases. Hassan et al. [32], Xu et al. [53], Xezonaki et al. [56], Khan et al. [61] and Wu et al.
[67] propose to create a tool or a practical application of their model. Figure 5 shows all the
possibilities.


                                                 21
Martín Di Felice et al. CEUR Workshop Proceedings                                             11–27


Figure 4: Validation Types


Figure 5: Future work


4. Conclusions
This systematic review was performed to highlight the state of the art in the domain of depression
diagnosis through AI using text-based methods. Analyzing the relevant literature, it is concluded
that NLP and NN are the most used algorithms, used in conjunction with colloquial text-based
datasets, mostly extracted from social networks. In most of the studies, the lack of sufficiently
big datasets was stated, illustrating the demand for larger datasets for future work. Also, the
use of supervised learning preferred over using unsupervised learning has been noted, whereas,
only a small section of the studies has opted for unsupervised learning. As the domain of mental
health embraces AI tools for different purposes like diagnosis and predictions of mental health
issues, aspects like the generation of significantly large databases are indispensable for better
training of the algorithms. Especially in the area of depression, this also opens the possibility of
studies in larger domains, providing more reliable and reusable AI models for diagnosis.


                                                22
Martín Di Felice et al. CEUR Workshop Proceedings                                              11–27


5. Acknowledgements
This work was supported and financed by the Cloudgenia group through its technical and
operational initiatives.


References
 [1] M. A. Boden, Artificial intelligence, Elsevier, 1996.
 [2] A. H. Renear, S. Sacchi, K. M. Wickett, Definitions of dataset in the scientific and technical
     literature, Proceedings of the American Society for Information Science and Technology
     47 (2010) 1–4.
 [3] A. Ajiboye, R. Abdullah-Arshah, H. Qin, H. Isah-Kebbe, Evaluating the effect of dataset
     size on predictive model using supervised learning technique, Int. J. Comput. Syst. Softw.
     Eng 1 (2015) 75–84.
 [4] F. Velosa, H. Florez, Edge solution with machine learning and open data to interpret signs
     for people with visual disability, CEUR Workshop Proceedings (2020) 15–26.
 [5] R. L. Villars, C. W. Olofson, M. Eastwood, Big data: What it is and why you should care,
     White paper, IDC 14 (2011) 1–14.
 [6] S. Graham, C. Depp, E. E. Lee, C. Nebeker, X. Tu, H.-C. Kim, D. V. Jeste, Artificial intelligence
     for mental health and mental illnesses: an overview, Current psychiatry reports 21 (2019)
     1–18.
 [7] F. Jiang, Y. Jiang, H. Zhi, Y. Dong, H. Li, S. Ma, Y. Wang, Q. Dong, H. Shen, Y. Wang, Artificial
     intelligence in healthcare: past, present and future, Stroke and vascular neurology 2 (2017).
 [8] J. Morley, C. C. Machado, C. Burr, J. Cowls, I. Joshi, M. Taddeo, L. Floridi, The ethics of ai
     in health care: a mapping review, Social Science & Medicine 260 (2020) 113172.
 [9] B. X. Tran, R. S. McIntyre, C. A. Latkin, H. T. Phan, G. T. Vu, H. L. T. Nguyen, K. K. Gwee,
     C. S. Ho, R. C. Ho, The current research landscape on the artificial intelligence application
     in the management of depressive disorders: a bibliometric analysis, International journal
     of environmental research and public health 16 (2019) 2150.
[10] K. Kroenke, T. W. Strine, R. L. Spitzer, J. B. Williams, J. T. Berry, A. H. Mokdad, The phq-8
     as a measure of current depression in the general population, Journal of affective disorders
     114 (2009) 163–173.
[11] K. Kroenke, R. L. Spitzer, The phq-9: a new depression diagnostic and severity measure,
     2002.
[12] American Psychological Association, Beck depression inventory (bdi)„ 2022.
[13] A. T. Beck, R. A. Steer, G. Brown, Beck depression inventory–ii, Psychological assessment
     (1996).
[14] PsycNet, Children’s depression inventory, 2022.
[15] A. P. Association, Apa - the structured clinical interview for dsm-5, 2022.
[16] S. Shiffman, A. A. Stone, M. R. Hufford, Ecological momentary assessment, Annu. Rev.
     Clin. Psychol. 4 (2008) 1–32.
[17] J. Johnson, Resilience appraisals scale, 2022.


                                                 23
Martín Di Felice et al. CEUR Workshop Proceedings                                           11–27


[18] R. L. Spitzer, K. Kroenke, J. B. Williams, B. Löwe, A brief measure for assessing generalized
     anxiety disorder: the gad-7, Archives of internal medicine 166 (2006) 1092–1097.
[19] L. S. Radloff, The use of the center for epidemiologic studies depression scale in adolescents
     and young adults, Journal of youth and adolescence 20 (1991) 149–166.
[20] J. B. Awotunde, S. A. Ajagbe, H. Florez, Internet of things with wearable devices and artifi-
     cial intelligence for elderly uninterrupted healthcare monitoring systems, in: International
     Conference on Applied Informatics, Springer, 2022, pp. 278–291.
[21] F. Edition, et al., Diagnostic and statistical manual of mental disorders, Am Psychiatric
     Assoc 21 (2013) 591–643.
[22] E. O. Ogunseye, C. A. Adenusi, A. C. Nwanakwaugwu, S. A. Ajagbe, S. O. Akinola, Predictive
     analysis of mental health conditions using adaboost algorithm, ParadigmPlus 3 (2022)
     11–26.
[23] G. Y. Lim, W. W. Tam, Y. Lu, C. S. Ho, M. W. Zhang, R. C. Ho, Prevalence of depression in
     the community from 30 countries between 1994 and 2014, Scientific reports 8 (2018) 1–10.
[24] K. Petersen, R. Feldt, S. Mujtaba, M. Mattsson, Systematic mapping studies in software
     engineering, in: 12th International Conference on Evaluation and Assessment in Software
     Engineering (EASE) 12, 2008, pp. 1–10.
[25] G. Rao, Y. Zhang, L. Zhang, Q. Cong, Z. Feng, Mgl-cnn: a hierarchical posts representations
     model for identifying depressed individuals in online forums, IEEE Access 8 (2020) 32395–
     32403.
[26] R. S. McGinnis, E. W. McGinnis, J. Hruschak, N. L. Lopez-Duran, K. Fitzgerald, K. L.
     Rosenblum, M. Muzik, Rapid anxiety and depression diagnosis in young children enabled
     by wearable sensors and machine learning, in: 2018 40th Annual International Conference
     of the IEEE Engineering in Medicine and Biology Society (EMBC), IEEE, 2018, pp. 3983–
     3986.
[27] Q. Cong, Z. Feng, F. Li, Y. Xiang, G. Rao, C. Tao, Xa-bilstm: A deep learning approach
     for depression detection in imbalanced data, in: 2018 IEEE International Conference on
     Bioinformatics and Biomedicine (BIBM), IEEE, 2018, pp. 1624–1627.
[28] W. Gerych, E. Agu, E. Rundensteiner, Classifying depression in imbalanced datasets using
     an autoencoder-based anomaly detection approach, in: 2019 IEEE 13th International
     Conference on Semantic Computing (ICSC), IEEE, 2019, pp. 124–127.
[29] M. Deshpande, V. Rao, Depression detection using emotion artificial intelligence, in: 2017
     international conference on intelligent sustainable systems (iciss), IEEE, 2017, pp. 858–862.
[30] Y. Wang, Z. Wang, C. Li, Y. Zhang, H. Wang, A multimodal feature fusion-based method
     for individual depression detection on sina weibo, in: 2020 IEEE 39th International
     Performance Computing and Communications Conference (IPCCC), IEEE, 2020, pp. 1–8.
[31] K. Malviya, B. Roy, S. Saritha, A transformers approach to detect depression in social
     media, in: 2021 International Conference on Artificial Intelligence and Smart Systems
     (ICAIS), IEEE, 2021, pp. 718–723.
[32] M. M. Hassan, M. A. R. Khan, K. K. Islam, M. M. Hassan, M. F. Rabbi, Depression detection
     system with statistical analysis and data mining approaches, in: 2021 International
     Conference on Science & Contemporary Technologies (ICSCT), IEEE, 2021, pp. 1–6.
[33] P. Kumar, R. Chauhan, T. Stephan, A. Shankar, S. Thakur, A machine learning imple-
     mentation for mental health care. application: Smart watch for depression detection, in:


                                                24
Martín Di Felice et al. CEUR Workshop Proceedings                                           11–27


     2021 11th International Conference on Cloud Computing, Data Science & Engineering
     (Confluence), IEEE, 2021, pp. 568–574.
[34] A. H. Uddin, D. Bapery, A. S. M. Arif, Depression analysis from social media data in bangla
     language using long short term memory (lstm) recurrent neural network technique, in:
     2019 International Conference on Computer, Communication, Chemical, Materials and
     Electronic Engineering (IC4ME2), IEEE, 2019, pp. 1–4.
[35] D. B. Victor, J. Kawsher, M. S. Labib, S. Latif, Machine learning techniques for depression
     analysis on social media-case study on bengali community, in: 2020 4th International
     Conference on Electronics, Communication and Aerospace Technology (ICECA), IEEE,
     2020, pp. 1118–1126.
[36] R. Chiong, G. S. Budhi, S. Dhakal, Combining sentiment lexicons and content-based
     features for depression detection, IEEE Intelligent Systems 36 (2021) 99–105.
[37] V. Arun, V. Prajwal, M. Krishna, B. Arunkumar, S. Padma, V. Shyam, A boosted machine
     learning approach for detection of depression, in: 2018 IEEE Symposium Series on
     Computational Intelligence (SSCI), IEEE, 2018, pp. 41–47.
[38] M. Raihan, A. K. Bairagi, S. Rahman, A machine learning based study to predict depres-
     sion with monitoring actigraph watch data, in: 2021 12th International Conference on
     Computing Communication and Networking Technologies (ICCCNT), IEEE, 2021, pp. 1–5.
[39] K. A. Govindasamy, N. Palanichamy, Depression detection using machine learning tech-
     niques on twitter data, in: 2021 5th international conference on intelligent computing and
     control systems (ICICCS), IEEE, 2021, pp. 960–966.
[40] N. Al Asad, M. A. M. Pranto, S. Afreen, M. M. Islam, Depression detection by analyzing
     social media posts of user, in: 2019 IEEE International Conference on Signal Processing,
     Information, Communication & Systems (SPICSCON), IEEE, 2019, pp. 13–17.
[41] M. M. Tadesse, H. Lin, B. Xu, L. Yang, Detection of depression-related posts in reddit social
     media forum, IEEE Access 7 (2019) 44883–44893.
[42] F. M. Shah, F. Ahmed, S. K. S. Joy, S. Ahmed, S. Sadek, R. Shil, M. H. Kabir, Early depression
     detection from social network using deep learning techniques, in: 2020 IEEE Region 10
     Symposium (TENSYMP), IEEE, 2020, pp. 823–826.
[43] P. Bhat, A. Anuse, R. Kute, R. Bhadade, P. Purnaye, Mental health analyzer for depression
     detection based on textual analysis, Journal of Advances in Information Technology Vol
     13 (2022).
[44] R. Santana, B. Santos, T. Lima, M. Teodoro, S. Pinto, L. Zárate, C. Nobre, Genetic algorithms
     for feature selection in the children and adolescents depression context, in: 2019 18th IEEE
     International Conference On Machine Learning And Applications (ICMLA), IEEE, 2019,
     pp. 1470–1475.
[45] K. O. Asare, Y. Terhorst, J. Vega, E. Peltonen, E. Lagerspetz, D. Ferreira, et al., Predict-
     ing depression from smartphone behavioral markers using machine learning methods,
     hyperparameter optimization, and feature importance analysis: exploratory study, JMIR
     mHealth and uHealth 9 (2021) e26540.
[46] K. Hemmatirad, H. Bagherzadeh, E. Fazl-Ersi, A. Vahedian, Detection of mental illness risk
     on social media through multi-level svms, in: 2020 8th Iranian Joint Congress on Fuzzy
     and Intelligent Systems (CFIS), IEEE, 2020, pp. 116–120.
[47] H. Zogan, I. Razzak, X. Wang, S. Jameel, G. Xu, Explainable depression detection with


                                                25
Martín Di Felice et al. CEUR Workshop Proceedings                                           11–27


     multi-aspect features using a hybrid deep learning model on social media, World Wide
     Web 25 (2022) 281–304.
[48] U. M. Haque, E. Kabir, R. Khanam, Detection of child depression using machine learning
     methods, PLoS one 16 (2021) e0261131.
[49] R. Chiong, G. S. Budhi, S. Dhakal, F. Chiong, A textual-based featuring approach for
     depression detection using machine learning classifiers and social media texts, Computers
     in Biology and Medicine 135 (2021) 104499.
[50] N. Narziev, H. Goh, K. Toshnazarov, S. A. Lee, K.-M. Chung, Y. Noh, Stdd: Short-term
     depression detection with passive sensing, Sensors 20 (2020) 1396.
[51] M. Islam, M. A. Kabir, A. Ahmed, A. R. M. Kamal, H. Wang, A. Ulhaq, et al., Depression
     detection from social network data using machine learning techniques, Health information
     science and systems 6 (2018) 1–12.
[52] H. Zogan, I. Razzak, S. Jameel, G. Xu, Depressionnet: learning multi-modalities with
     user post summarization for depression detection on social media, in: proceedings of the
     44th international ACM SIGIR conference on research and development in information
     retrieval, 2021, pp. 133–142.
[53] X. Xu, P. Chikersal, J. M. Dutcher, Y. S. Sefidgar, W. Seo, M. J. Tumminia, D. K. Villalba,
     S. Cohen, K. G. Creswell, J. D. Creswell, et al., Leveraging collaborative-filtering for
     personalized behavior modeling: A case study of depression detection among college
     students, Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous
     Technologies 5 (2021) 1–27.
[54] A. Shrestha, E. Serra, F. Spezzano, Multi-modal social and psycho-linguistic embedding
     via recurrent neural networks to identify depressed users in online forums, Network
     Modeling Analysis in Health Informatics and Bioinformatics 9 (2020) 1–11.
[55] H. S. AlSagri, M. Ykhlef, Machine learning-based approach for depression detection in
     twitter using content and activity features, IEICE Transactions on Information and Systems
     103 (2020) 1825–1832.
[56] D. Xezonaki, G. Paraskevopoulos, A. Potamianos, S. Narayanan, Affective conditioning on
     hierarchical networks applied to depression detection from transcribed clinical interviews,
     arXiv preprint arXiv:2006.08336 (2020).
[57] F. Ramiandrisoa, J. Mothe, Early detection of depression and anorexia from social media:
     A machine learning approach, in: Circle 2020, volume 2621, 2020, pp. 1–12.
[58] S. G. Burdisso, M. Errecalde, M. Montes-y Gómez, A text classification framework for
     simple and effective early depression detection over social media streams, Expert Systems
     with Applications 133 (2019) 182–197.
[59] M. Stankevich, I. Smirnov, N. Kiselnikova, A. Ushakova, Depression detection from social
     media profiles, in: International Conference on Data Analytics and Management in Data
     Intensive Domains, Springer, 2019, pp. 181–194.
[60] B. Choi, G. Shim, B. Jeong, S. Jo, Data-driven analysis using multiple self-report question-
     naires to identify college students at high risk of depressive disorder, Scientific reports 10
     (2020) 1–13.
[61] M. R. Khan, S. Z. Rizvi, A. Yasin, M. Ali, Depression analysis of social media activists using
     the gated architecture bi-lstm, in: 2021 International Conference on Cyber Warfare and
     Security (ICCWS), IEEE, 2021, pp. 76–81.


                                                26
Martín Di Felice et al. CEUR Workshop Proceedings                                          11–27


[62] Y. Zhang, H. Lyu, Y. Liu, X. Zhang, Y. Wang, J. Luo, et al., Monitoring depression trends on
     twitter during the covid-19 pandemic: observational study, JMIR infodemiology 1 (2021)
     e26769.
[63] L. Ren, H. Lin, B. Xu, S. Zhang, L. Yang, S. Sun, et al., Depression detection on reddit
     with an emotion-based attention network: algorithm development and validation, JMIR
     Medical Informatics 9 (2021) e28754.
[64] A. Amanat, M. Rizwan, A. R. Javed, M. Abdelhaq, R. Alsaqour, S. Pandya, M. Uddin, Deep
     learning for depression detection from textual data, Electronics 11 (2022) 676.
[65] A. M. Almars, Attention-based bi-lstm model for arabic depression classification, CMC-
     COMPUTERS MATERIALS & CONTINUA 71 (2022) 3091–3106.
[66] D. Inkpen, R. Skaik, P. Buddhitha, D. Angelov, M. T. Fredenburgh, uottawa at erisk 2021:
     Automatic filling of the beck’s depression inventory questionnaire using deep learning.,
     in: CLEF (Working Notes), 2021, pp. 966–980.
[67] M. Y. Wu, C.-Y. Shen, E. T. Wang, A. L. Chen, A deep architecture for depression detection
     using posting, behavior, and living environment data, Journal of Intelligent Information
     Systems 54 (2020) 225–244.
[68] E. Shah, M. K. Ahsan, I. Mazahir, Machine learning based methodology for depressive sen-
     timent analysis, in: International Conference on Intelligent Technologies and Applications,
     Springer, 2020, pp. 93–99.
[69] M. Stankevich, I. Smirnov, N. Kiselnikova, A. Ushakova, Depression detection from social
     media profiles, in: International Conference on Data Analytics and Management in Data
     Intensive Domains, Springer, 2019, pp. 181–194.


                                                27