=Paper= {{Paper |id=Vol-2844/games6 |storemode=property |title=Serious Game Development for the Diagnosis of Major Depressive Disorder Cases Using Machine Learning Methods (short paper) |pdfUrl=https://ceur-ws.org/Vol-2844/games6.pdf |volume=Vol-2844 |authors=Athanasios Tsionas,Aristotelis Lazaridis,Ioannis Vlahavas |dblpUrl=https://dblp.org/rec/conf/setn/TsionasLV20 }} ==Serious Game Development for the Diagnosis of Major Depressive Disorder Cases Using Machine Learning Methods (short paper)== https://ceur-ws.org/Vol-2844/games6.pdf
      Serious Game Development for the Diagnosis of Major
    Depressive Disorder Cases Using Machine Learning Methods
              Athanasios Tsionas                                       Aristotelis Lazaridis                               Ioannis Vlahavas
     Aristotle University of Thessaloniki                     Aristotle University of Thessaloniki               Aristotle University of Thessaloniki
            School of Informatics                                    School of Informatics                              School of Informatics
             Thessaloniki, Greece                                     Thessaloniki, Greece                               Thessaloniki, Greece
            atsionas@csd.auth.gr                                      arislaza@csd.auth.gr                              vlahavas@csd.auth.gr

ABSTRACT                                                                               to correspond and portray different scientific criteria used in the
Major Depressive Disorder (MDD) is a serious mental disorder that                      diagnosis of MDD. Using a small dataset, collected through the
affects millions of adults, occasionally leading to life-threatening                   gameplay of few players diagnosed with and without MDD, we
results. Current diagnostic tools for MDD mostly consist of ques-                      were able to developed a prototype system using Machine Learning
tionnaires and/or long, specialized therapy sessions. In this work                     models which lead to promising results on MDD detection.
we present a serious game called "The Delivery", developed for di-                         Other serious games exist for the treatment or prevention of
agnosing MDD in players. The video game has the players immerse                        depression [5, 7], but, to the authors’ knowledge, this is the fist
into a realistic scenario, the development of which depends on their                   serious game developed for the diagnosis of MDD that uses AI and
actions, that is, through conversations with in-game characters,                       ML methods, and does not rely on any wearables to collect data
completion of quests, and interactions with the environment. All                       about the player. For instance, in [10], the authors use wearables
in-game features and mechanics are designed to correspond to spe-                      to sample physiological activities (EEG, ECG, etc.) from potential
cific diagnostic criteria for MDD. We recorded gameplay data from                      patients with depression during a gameplay session of a serious
labeled players (both MDD and non-MDD cases) in order to train                         game developed for this reason. Statistical measures were then used
Machine Learning models that can accurately distinguish gameplay                       to make correlations between extracted signals and players with
behaviors MDD-positive and MDD-negative players.                                       negative moods.
                                                                                           Moreover, our work is in line with the directions proposed by
CCS CONCEPTS                                                                           [9], since the serious game we developed gathers substantial data
                                                                                       related to the player’s cognitive behavior during the gameplay
• Computing methodologies → Machine learning; • Applied
                                                                                       session. Our findings indicate that this proof-of-work concept can
computing → Life and medical sciences.
                                                                                       be scaled to a highly-accurate, non-interventional system used for
KEYWORDS                                                                               the diagnosis of MDD.
                                                                                           In [11], the authors use the AVEC2016 dataset [14] to train clas-
serious games, major depression disorder, machine learning, artifi-                    sifiers for the purpose of diagnosing depression from voice data.
cial intelligence                                                                      Even though the results indicate a better-than-chance classification
                                                                                       ability, the feasibility and implementation details of extracting voice
1    INTRODUCTION                                                                      data from users through a proposed application for smartphone
Major Depressive Disorder (MDD) is the most common type of                             devices is not considered, and the model is not tested on real-world
depression in adults. The characteristics of MDD can be really                         scenarios.
hard to identify and manage, especially due to the fact that they
are triggered by psychological factors. Since identifying the exact                    2     METHODOLOGY
pathological and psychological features in a patient is difficult,                     This section covers the basic steps taken in regard to extracting the
alternative methodologies for this disorder’s diagnosis should be                      resulting conclusions. These steps include data processing opera-
explored.                                                                              tions, as well as algorithm selection, implementation and tuning
   This poses a challenge that many scientists attempt to tackle in                    carried out for the experiments.
various ways. Serious games (i.e. games with a specific purpose
other than entertainment) have also made a step towards the im-                        2.1     Gameplay scenario and Major Depressive
provement of mental health issues [6]. In this paper, we present a
serious game developed for the diagnosis of MDD in young adults,
                                                                                               Disorder characteristics
using Artificial Intelligence (AI) and Machine Learning (ML) meth-                     Initially we developed a serious, story-based game called "The
ods. In particular, we developed a story-based video game, where                       Delivery"1 , for the diagnosis of Major Depressive Disorder (MDD)
players are able to alter the scenario with their actions and conver-                  using Artificial Intelligence and Machine Learning methods. "The
sations with game characters, all of which are carefully designed                      Delivery" follows a relatively realistic scenario, in which the player
                                                                                       is found inside a building, playing as a character who has been asked
                                                                                       to deliver medical supplies to a friend. Shortly afterwards, a huge
GAITCS2020, September 02–04, 2020, Athens, Greece
Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons   earthquake shall trap the building and its residents, introducing an
License Attribution 4.0 International (CC BY 4.0).
                                                                                       1 https://www.dropbox.com/s/g4c872t9a6uwcfh/theDelivery.zip?dl=0
GAITCS2020, September 02–04, 2020, Athens, Greece                                         Athanasios Tsionas, Aristotelis Lazaridis, and Ioannis Vlahavas




                                                    Figure 1: The Delivery serious game


unexpected turn of events, while the player will have to deal with       mechanics. The player is not aware of the existence of this metric,
unsettling situations.                                                   so as not to become influenced by it.
   Throughout the game, the player is given the opportunity to              The final score is computed in various ways. For instance, it
have conversations with in-game characters that lead to different        changes every time the player interacts with an in-game character.
outcomes. Additionally, the player is able to interact with in-game      Every answer during a discussion has a different weight relative
surroundings and objects. Even though many of the given choices          to a particular diagnostic criterion extracted from DSM-V (e.g. the
and actions may seem insignificant to the player, they have been         player admits that he/she does not sleep well at nights usually).
carefully devised to correspond to different criteria used in the        When a given answer is related to a specific DSM-V criterion, the
diagnosis of MDD. These criteria have been extracted from the            final score changes accordingly.
Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition        Another way to increase the final score is through interaction
(DSM-V) [2].                                                             with the environment. In order to keep statistics high, the player
   More specifically, the conversations with in-game characters are      needs to interact with items within the surroundings. Showing
implemented in such ways that, different player answers signal           sensitivity to characters and circumstances alters the final score in
different probability (weight) of the player meeting a particular        favor of a non-MDD label.
criterion of MDD. Moreover, with the help of a commonly used AI
method in video games termed Behavior Trees, conversation flows          2.3    Data collection and processing
are highly flexible and are adjusted suitably to maximize not only       During a gameplay session, the player’s actions and their outcomes
information extraction, but also the feeling of a realistic dialogue.    are recorded. This data includes: changes in the player’s basic sta-
   After gathering data from gameplay sessions of MDD and non-           tistics (Health, Fatigue, Sanity), the player’s answers during dis-
MDD players, we used Machine Learning methods to develop classi-         cussions (in chronological order), and the side quests successfully
fiers that can predict with a probability whether a gameplay session     completed.
belongs to a player with MDD.                                               We collected 26 gameplay sessions from two groups: players
                                                                         who were diagnosed with MDD (24%) and players who were not
2.2    Gameplay mechanics                                                diagnosed with MDD and were highly unlikely to suffer from MDD
As in any serious game, the purpose is dual: entertainment for the       (76%). The second group included cases who had never received
player, and achievement of the specific "serious" goal. The first        an official MDD diagnosis in the past, and were asked to fill in
element includes all entertaining features, which are related to the     the Patient Health Questionnaire (PHQ-9) [12]. This questionnaire
player’s perspective about the game.                                     corresponds to an official questionnaire used for the diagnosis of
   The second element includes the mechanisms that make this             MDD, and was used in order to filter cases that had non-trivial
game a serious game. For this project, it was essential to have a        probability of suffering from MDD, so as to keep only cases with a
good understanding of both dimensions.                                   small probability of suffering from this disorder. This was a clear
   In "The Delivery", the player has to look after his in-game sta-      necessity in order to label players accurately as MDD-positive or
tistics, i.e. Health, Fatigue and Sanity. For the player to keep these   MDD-negative. It should be noted that the members of the second
statistics within limits, he has to explore the environment, find        group are not considered as "healthy", but they are considered as
hidden items or complete side-quests.                                    non-MDD cases (i.e. they could be suffering from another disorder).
   To achieve our goals, the player has to imagine himself within the    Unfortunately, practical restrictions did not allow us to develop a
scenario, in order to make the diagnosis more accurate, therefore        larger dataset.
the game was designed with this in mind. In detail, it belongs to
a first-person video game category, so that the player can only          2.4    Algorithms
imagine the character’s look, and the character’s voice is never         Initially, we performed a visualization procedure in order to visual-
heard.                                                                   ize the game’s capability to distinguish MDD-positive and MDD-
   The most important statistic within the game is the final score,      negative behaviors within the game, using Principal Component
which is an indication of the MDD level, according to the in-game        Analysis (PCA) [13].
Serious Game Development for the Diagnosis of Major Depressive Disorder                            GAITCS2020, September 02–04, 2020, Athens, Greece




Figure 2: Data visualization after performing PCA with 𝑛 = 2 and 𝑛 = 3 components. MDD-positive players show a non-trivial
difference in gameplay behavior than non-MDD players.


   Then, we applied 4 classification methods: Decision Trees [4],         the Decision Tree classifier is depicted in Figure 3. A baseline ran-
k-Nearest Neighbors (kNN) [1] and Support Vector Machines (SVM)           dom labeling model (without stratification) was also applied for
[15] with linear kernel, and Random Forests (RF) [3] in order to          comparison purposes.
create a prediction model that would be able to classify a player
as MDD-positive or MDD-negative, using our labeled data for the                           Random   Decision Trees   SVM (linear)   kNN      RF
                                                                              Accuracy     46.2%       65.4%          80.8%         77%    73.1%
training procedure.                                                           Precision    48.2%       53.7%          72.8%        38.5%   63.5%
   The RF classifier used 𝑛 = 20 trees and the kNN classifier used             Recall      47.5%       54.2%           70%          50%     65%
𝑘 = 6 neighbors with the Euclidean distance metric. For the kNN                  F1        43.1%       53.8%          71.2%        43.5%   64.1%
classifier, we attempted to use an odd number of neighbors to avoid             F1-w       50.2%       66.3%          80.1%        66.9%   73.8%
ties, but the optimal results were produced with 𝑘 = 6.                   Table 1: Performance of different classifiers in predicting
                                                                          MDD cases from gameplay data.
3    RESULTS
In this section we present visualization results using the PCA di-
mensionality reduction method and performance evaluation results
of Machine Learning models trained using gameplay data. Principal         4    CONCLUSION
Component Analysis was applied for 𝑛 = 2 and 𝑛 = 3 (Figure 2)             In this paper, we developed a serious game prototype system called
components to the data. Visualization of the resulting data points        "The Delivery" for the diagnosis of Major Depressive Disorder
indicates that the game design has employed in a relatively accurate      (MDD). This video game, which features Artificial Intelligence
manner diagnostic criteria for MDD. More particularly, in both ex-        methods, was designed with special care taken when implementing
periments, results show a dense area of MDD-positive cases, which         in-game mechanics, so as to correspond to official diagnostic cri-
represent their in-game behavior, with only a few outlier cases.          teria of the particular disorder. After recording gameplay sessions
On the contrary, MDD-negative cases are more sparse and do not            of players who were diagnosed with MDD and players who were
follow particular behavioral patterns, which is expected since the        highly unlikely to have MDD, we were able to train Machine Learn-
game mechanics do not target such behaviors.                              ing models that showed promising performance in distinguishing
   Additionally, we trained various classification models based on        positive from negative MDD cases in gameplay behaviors.
the gameplay data, with the purpose of predicting whether a game-             Our proposed method is a novel concept, since no other work
play session belongs to an MDD-positive or MDD-negative player.           has touched the subject of MDD diagnosis in such perspective.
Even with such a small, imbalanced dataset such ours, all models          It is crucial that new, more accurate and efficient techniques are
after training showed promising results in managing to distinguish        developed, which are better suited to current and future societal
gameplay sessions of MDD-positive and MDD-negative players.               needs. Additionally, this prototype can be further improved by
   More particularly, we measured the performance of each model           extending the current scenario and gathering more data, which
using the Accuracy, Precision, Recall, F1-score and F1-weighted           will lead to higher diagnostic accuracy of the system. A larger
score metrics, computed using a Leave-One-Out Cross Validation            amount of gameplay data will also give us the opportunity to try
[8] scheme. Fine-tuning of the hyper-parameters was performed             other Machine Learning methods as well (e.g. Neural Networks).
using a simple grid-search procedure. The results are presented in        Moreover, this system can be adapted accordingly, for the purpose
Table 1. The SVM classifier with linear kernel seems to have overall      of identifying other types of depression, or mental health issues in
best results among the three classifiers, indicating that there is a      general. Lastly, the video game industry is an ideal stakeholder for
hyperplane that can separate accurately the data. Random Forests          incorporating such diagnostic methodologies in video games and
also have a relatively good performance. The graph produced from          services.
GAITCS2020, September 02–04, 2020, Athens, Greece                                                               Athanasios Tsionas, Aristotelis Lazaridis, and Ioannis Vlahavas




Figure 3: Decision Tree classifier for gameplay data. The root node (Sanity) has a strong role in the final labeling of a player.


REFERENCES                                                                                 [9] Regan Lee Mandryk and Max Valentin Birk. 2019. The potential of game-based
 [1] N. S. Altman. 1992.              An Introduction to Kernel and Nearest-                   digital biomarkers for modeling mental health. JMIR mental health 6, 4 (2019),
     Neighbor Nonparametric Regression.               The American Statistician 46,            e13485.
     3 (1992), 175–185.               https://doi.org/10.1080/00031305.1992.10475879      [10] Rytis Maskeliūnas, Tomas Blažauskas, and Robertas Damaševičius. 2017. De-
     arXiv:https://www.tandfonline.com/doi/pdf/10.1080/00031305.1992.10475879                  pression behavior detection model based on participation in serious games. In
 [2] American Psychiatric Association et al. 2013. Diagnostic and statistical manual of        International Joint Conference on Rough Sets. Springer, 423–434.
     mental disorders (DSM-5®). American Psychiatric Pub.                                 [11] Alexandros Roniotis and Manolis Tsiknakis. 2017. Detecting depression using
 [3] Leo Breiman. 2001. Random Forests. Machine Learning 45, 1 (01 Oct 2001), 5–32.            voice signal extracted by Chatbots: A feasibility study. In Interactivity, game
     https://doi.org/10.1023/A:1010933404324                                                   creation, design, learning, and innovation. Springer, 386–392.
 [4] Leo Breiman, Jerome H Friedman, Richard A Olshen, and Charles J Stone. 1984.         [12] Robert L Spitzer, Janet BW Williams, Kurt Kroenke, Raymond Hornyak, Julia
     Classification and regression trees. Belmont, CA: Wadsworth. International Group          McMurray, Patient Health Questionnaire Obstetrics-Gynecology Study Group,
     432 (1984), 151–166.                                                                      et al. 2000. Validity and utility of the PRIME-MD patient health questionnaire in
 [5] Lucas Pfeiffer Salomâo Dias, Jorge Luis Victória Barbosa, and Henrique Dam-               assessment of 3000 obstetric-gynecologic patients: the PRIME-MD Patient Health
     asceno Vianna. 2018. Gamification and serious games in depression care: a                 Questionnaire Obstetrics-Gynecology Study. American journal of obstetrics and
     systematic mapping study. Telematics and Informatics 35, 1 (2018), 213–224.               gynecology 183, 3 (2000), 759–769.
 [6] Theresa M Fleming, Lynda Bavin, Karolina Stasiak, Eve Hermansson-Webb,               [13] Michael E Tipping and Christopher M Bishop. 1999. Probabilistic principal
     Sally N Merry, Colleen Cheek, Mathijs Lucassen, Ho Ming Lau, Britta Pollmuller,           component analysis. Journal of the Royal Statistical Society: Series B (Statistical
     and Sarah Hetrick. 2017. Serious games and gamification for mental health:                Methodology) 61, 3 (1999), 611–622.
     current status and promising directions. Frontiers in psychiatry 7 (2017), 215.      [14] Michel F. Valstar, Jonathan Gratch, Björn W. Schuller, Fabien Ringeval, Denis
 [7] Theresa M Fleming, Colleen Cheek, Sally N Merry, Hiran Thabrew, Heather                   Lalanne, Mercedes Torres, Stefan Scherer, Giota Stratou, Roddy Cowie, and
     Bridgman, Karolina Stasiak, Matthew Shepherd, Yael Perry, and Sarah Hetrick.              Maja Pantic. 2016. AVEC 2016 - Depression, Mood, and Emotion Recognition
     2014. Serious games for the treatment or prevention of depression: a systematic           Workshop and Challenge. CoRR abs/1605.01600 (2016). arXiv:1605.01600 http:
     review. (2014).                                                                           //arxiv.org/abs/1605.01600
 [8] Seymour Geisser. 1993. Predictive inference. Vol. 55. CRC press.                     [15] Vladimir Naumovich Vapnik. 2000. The Nature of Statistical Learning Theory,
                                                                                               Second Edition. Springer.