Serious Game Development for the Diagnosis of Major Depressive Disorder Cases Using Machine Learning Methods Athanasios Tsionas Aristotelis Lazaridis Ioannis Vlahavas Aristotle University of Thessaloniki Aristotle University of Thessaloniki Aristotle University of Thessaloniki School of Informatics School of Informatics School of Informatics Thessaloniki, Greece Thessaloniki, Greece Thessaloniki, Greece atsionas@csd.auth.gr arislaza@csd.auth.gr vlahavas@csd.auth.gr ABSTRACT to correspond and portray different scientific criteria used in the Major Depressive Disorder (MDD) is a serious mental disorder that diagnosis of MDD. Using a small dataset, collected through the affects millions of adults, occasionally leading to life-threatening gameplay of few players diagnosed with and without MDD, we results. Current diagnostic tools for MDD mostly consist of ques- were able to developed a prototype system using Machine Learning tionnaires and/or long, specialized therapy sessions. In this work models which lead to promising results on MDD detection. we present a serious game called "The Delivery", developed for di- Other serious games exist for the treatment or prevention of agnosing MDD in players. The video game has the players immerse depression [5, 7], but, to the authors’ knowledge, this is the fist into a realistic scenario, the development of which depends on their serious game developed for the diagnosis of MDD that uses AI and actions, that is, through conversations with in-game characters, ML methods, and does not rely on any wearables to collect data completion of quests, and interactions with the environment. All about the player. For instance, in [10], the authors use wearables in-game features and mechanics are designed to correspond to spe- to sample physiological activities (EEG, ECG, etc.) from potential cific diagnostic criteria for MDD. We recorded gameplay data from patients with depression during a gameplay session of a serious labeled players (both MDD and non-MDD cases) in order to train game developed for this reason. Statistical measures were then used Machine Learning models that can accurately distinguish gameplay to make correlations between extracted signals and players with behaviors MDD-positive and MDD-negative players. negative moods. Moreover, our work is in line with the directions proposed by CCS CONCEPTS [9], since the serious game we developed gathers substantial data related to the player’s cognitive behavior during the gameplay • Computing methodologies → Machine learning; • Applied session. Our findings indicate that this proof-of-work concept can computing → Life and medical sciences. be scaled to a highly-accurate, non-interventional system used for KEYWORDS the diagnosis of MDD. In [11], the authors use the AVEC2016 dataset [14] to train clas- serious games, major depression disorder, machine learning, artifi- sifiers for the purpose of diagnosing depression from voice data. cial intelligence Even though the results indicate a better-than-chance classification ability, the feasibility and implementation details of extracting voice 1 INTRODUCTION data from users through a proposed application for smartphone Major Depressive Disorder (MDD) is the most common type of devices is not considered, and the model is not tested on real-world depression in adults. The characteristics of MDD can be really scenarios. hard to identify and manage, especially due to the fact that they are triggered by psychological factors. Since identifying the exact 2 METHODOLOGY pathological and psychological features in a patient is difficult, This section covers the basic steps taken in regard to extracting the alternative methodologies for this disorder’s diagnosis should be resulting conclusions. These steps include data processing opera- explored. tions, as well as algorithm selection, implementation and tuning This poses a challenge that many scientists attempt to tackle in carried out for the experiments. various ways. Serious games (i.e. games with a specific purpose other than entertainment) have also made a step towards the im- 2.1 Gameplay scenario and Major Depressive provement of mental health issues [6]. In this paper, we present a serious game developed for the diagnosis of MDD in young adults, Disorder characteristics using Artificial Intelligence (AI) and Machine Learning (ML) meth- Initially we developed a serious, story-based game called "The ods. In particular, we developed a story-based video game, where Delivery"1 , for the diagnosis of Major Depressive Disorder (MDD) players are able to alter the scenario with their actions and conver- using Artificial Intelligence and Machine Learning methods. "The sations with game characters, all of which are carefully designed Delivery" follows a relatively realistic scenario, in which the player is found inside a building, playing as a character who has been asked to deliver medical supplies to a friend. Shortly afterwards, a huge GAITCS2020, September 02–04, 2020, Athens, Greece Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons earthquake shall trap the building and its residents, introducing an License Attribution 4.0 International (CC BY 4.0). 1 https://www.dropbox.com/s/g4c872t9a6uwcfh/theDelivery.zip?dl=0 GAITCS2020, September 02–04, 2020, Athens, Greece Athanasios Tsionas, Aristotelis Lazaridis, and Ioannis Vlahavas Figure 1: The Delivery serious game unexpected turn of events, while the player will have to deal with mechanics. The player is not aware of the existence of this metric, unsettling situations. so as not to become influenced by it. Throughout the game, the player is given the opportunity to The final score is computed in various ways. For instance, it have conversations with in-game characters that lead to different changes every time the player interacts with an in-game character. outcomes. Additionally, the player is able to interact with in-game Every answer during a discussion has a different weight relative surroundings and objects. Even though many of the given choices to a particular diagnostic criterion extracted from DSM-V (e.g. the and actions may seem insignificant to the player, they have been player admits that he/she does not sleep well at nights usually). carefully devised to correspond to different criteria used in the When a given answer is related to a specific DSM-V criterion, the diagnosis of MDD. These criteria have been extracted from the final score changes accordingly. Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition Another way to increase the final score is through interaction (DSM-V) [2]. with the environment. In order to keep statistics high, the player More specifically, the conversations with in-game characters are needs to interact with items within the surroundings. Showing implemented in such ways that, different player answers signal sensitivity to characters and circumstances alters the final score in different probability (weight) of the player meeting a particular favor of a non-MDD label. criterion of MDD. Moreover, with the help of a commonly used AI method in video games termed Behavior Trees, conversation flows 2.3 Data collection and processing are highly flexible and are adjusted suitably to maximize not only During a gameplay session, the player’s actions and their outcomes information extraction, but also the feeling of a realistic dialogue. are recorded. This data includes: changes in the player’s basic sta- After gathering data from gameplay sessions of MDD and non- tistics (Health, Fatigue, Sanity), the player’s answers during dis- MDD players, we used Machine Learning methods to develop classi- cussions (in chronological order), and the side quests successfully fiers that can predict with a probability whether a gameplay session completed. belongs to a player with MDD. We collected 26 gameplay sessions from two groups: players who were diagnosed with MDD (24%) and players who were not 2.2 Gameplay mechanics diagnosed with MDD and were highly unlikely to suffer from MDD As in any serious game, the purpose is dual: entertainment for the (76%). The second group included cases who had never received player, and achievement of the specific "serious" goal. The first an official MDD diagnosis in the past, and were asked to fill in element includes all entertaining features, which are related to the the Patient Health Questionnaire (PHQ-9) [12]. This questionnaire player’s perspective about the game. corresponds to an official questionnaire used for the diagnosis of The second element includes the mechanisms that make this MDD, and was used in order to filter cases that had non-trivial game a serious game. For this project, it was essential to have a probability of suffering from MDD, so as to keep only cases with a good understanding of both dimensions. small probability of suffering from this disorder. This was a clear In "The Delivery", the player has to look after his in-game sta- necessity in order to label players accurately as MDD-positive or tistics, i.e. Health, Fatigue and Sanity. For the player to keep these MDD-negative. It should be noted that the members of the second statistics within limits, he has to explore the environment, find group are not considered as "healthy", but they are considered as hidden items or complete side-quests. non-MDD cases (i.e. they could be suffering from another disorder). To achieve our goals, the player has to imagine himself within the Unfortunately, practical restrictions did not allow us to develop a scenario, in order to make the diagnosis more accurate, therefore larger dataset. the game was designed with this in mind. In detail, it belongs to a first-person video game category, so that the player can only 2.4 Algorithms imagine the character’s look, and the character’s voice is never Initially, we performed a visualization procedure in order to visual- heard. ize the game’s capability to distinguish MDD-positive and MDD- The most important statistic within the game is the final score, negative behaviors within the game, using Principal Component which is an indication of the MDD level, according to the in-game Analysis (PCA) [13]. Serious Game Development for the Diagnosis of Major Depressive Disorder GAITCS2020, September 02–04, 2020, Athens, Greece Figure 2: Data visualization after performing PCA with 𝑛 = 2 and 𝑛 = 3 components. MDD-positive players show a non-trivial difference in gameplay behavior than non-MDD players. Then, we applied 4 classification methods: Decision Trees [4], the Decision Tree classifier is depicted in Figure 3. A baseline ran- k-Nearest Neighbors (kNN) [1] and Support Vector Machines (SVM) dom labeling model (without stratification) was also applied for [15] with linear kernel, and Random Forests (RF) [3] in order to comparison purposes. create a prediction model that would be able to classify a player as MDD-positive or MDD-negative, using our labeled data for the Random Decision Trees SVM (linear) kNN RF Accuracy 46.2% 65.4% 80.8% 77% 73.1% training procedure. Precision 48.2% 53.7% 72.8% 38.5% 63.5% The RF classifier used 𝑛 = 20 trees and the kNN classifier used Recall 47.5% 54.2% 70% 50% 65% 𝑘 = 6 neighbors with the Euclidean distance metric. For the kNN F1 43.1% 53.8% 71.2% 43.5% 64.1% classifier, we attempted to use an odd number of neighbors to avoid F1-w 50.2% 66.3% 80.1% 66.9% 73.8% ties, but the optimal results were produced with 𝑘 = 6. Table 1: Performance of different classifiers in predicting MDD cases from gameplay data. 3 RESULTS In this section we present visualization results using the PCA di- mensionality reduction method and performance evaluation results of Machine Learning models trained using gameplay data. Principal 4 CONCLUSION Component Analysis was applied for 𝑛 = 2 and 𝑛 = 3 (Figure 2) In this paper, we developed a serious game prototype system called components to the data. Visualization of the resulting data points "The Delivery" for the diagnosis of Major Depressive Disorder indicates that the game design has employed in a relatively accurate (MDD). This video game, which features Artificial Intelligence manner diagnostic criteria for MDD. More particularly, in both ex- methods, was designed with special care taken when implementing periments, results show a dense area of MDD-positive cases, which in-game mechanics, so as to correspond to official diagnostic cri- represent their in-game behavior, with only a few outlier cases. teria of the particular disorder. After recording gameplay sessions On the contrary, MDD-negative cases are more sparse and do not of players who were diagnosed with MDD and players who were follow particular behavioral patterns, which is expected since the highly unlikely to have MDD, we were able to train Machine Learn- game mechanics do not target such behaviors. ing models that showed promising performance in distinguishing Additionally, we trained various classification models based on positive from negative MDD cases in gameplay behaviors. the gameplay data, with the purpose of predicting whether a game- Our proposed method is a novel concept, since no other work play session belongs to an MDD-positive or MDD-negative player. has touched the subject of MDD diagnosis in such perspective. Even with such a small, imbalanced dataset such ours, all models It is crucial that new, more accurate and efficient techniques are after training showed promising results in managing to distinguish developed, which are better suited to current and future societal gameplay sessions of MDD-positive and MDD-negative players. needs. Additionally, this prototype can be further improved by More particularly, we measured the performance of each model extending the current scenario and gathering more data, which using the Accuracy, Precision, Recall, F1-score and F1-weighted will lead to higher diagnostic accuracy of the system. A larger score metrics, computed using a Leave-One-Out Cross Validation amount of gameplay data will also give us the opportunity to try [8] scheme. Fine-tuning of the hyper-parameters was performed other Machine Learning methods as well (e.g. Neural Networks). using a simple grid-search procedure. The results are presented in Moreover, this system can be adapted accordingly, for the purpose Table 1. The SVM classifier with linear kernel seems to have overall of identifying other types of depression, or mental health issues in best results among the three classifiers, indicating that there is a general. Lastly, the video game industry is an ideal stakeholder for hyperplane that can separate accurately the data. Random Forests incorporating such diagnostic methodologies in video games and also have a relatively good performance. The graph produced from services. GAITCS2020, September 02–04, 2020, Athens, Greece Athanasios Tsionas, Aristotelis Lazaridis, and Ioannis Vlahavas Figure 3: Decision Tree classifier for gameplay data. The root node (Sanity) has a strong role in the final labeling of a player. REFERENCES [9] Regan Lee Mandryk and Max Valentin Birk. 2019. The potential of game-based [1] N. S. Altman. 1992. An Introduction to Kernel and Nearest- digital biomarkers for modeling mental health. JMIR mental health 6, 4 (2019), Neighbor Nonparametric Regression. The American Statistician 46, e13485. 3 (1992), 175–185. https://doi.org/10.1080/00031305.1992.10475879 [10] Rytis Maskeliūnas, Tomas Blažauskas, and Robertas Damaševičius. 2017. De- arXiv:https://www.tandfonline.com/doi/pdf/10.1080/00031305.1992.10475879 pression behavior detection model based on participation in serious games. In [2] American Psychiatric Association et al. 2013. Diagnostic and statistical manual of International Joint Conference on Rough Sets. Springer, 423–434. mental disorders (DSM-5®). American Psychiatric Pub. [11] Alexandros Roniotis and Manolis Tsiknakis. 2017. Detecting depression using [3] Leo Breiman. 2001. Random Forests. Machine Learning 45, 1 (01 Oct 2001), 5–32. voice signal extracted by Chatbots: A feasibility study. In Interactivity, game https://doi.org/10.1023/A:1010933404324 creation, design, learning, and innovation. Springer, 386–392. [4] Leo Breiman, Jerome H Friedman, Richard A Olshen, and Charles J Stone. 1984. [12] Robert L Spitzer, Janet BW Williams, Kurt Kroenke, Raymond Hornyak, Julia Classification and regression trees. Belmont, CA: Wadsworth. International Group McMurray, Patient Health Questionnaire Obstetrics-Gynecology Study Group, 432 (1984), 151–166. et al. 2000. Validity and utility of the PRIME-MD patient health questionnaire in [5] Lucas Pfeiffer Salomâo Dias, Jorge Luis Victória Barbosa, and Henrique Dam- assessment of 3000 obstetric-gynecologic patients: the PRIME-MD Patient Health asceno Vianna. 2018. Gamification and serious games in depression care: a Questionnaire Obstetrics-Gynecology Study. American journal of obstetrics and systematic mapping study. Telematics and Informatics 35, 1 (2018), 213–224. gynecology 183, 3 (2000), 759–769. [6] Theresa M Fleming, Lynda Bavin, Karolina Stasiak, Eve Hermansson-Webb, [13] Michael E Tipping and Christopher M Bishop. 1999. Probabilistic principal Sally N Merry, Colleen Cheek, Mathijs Lucassen, Ho Ming Lau, Britta Pollmuller, component analysis. Journal of the Royal Statistical Society: Series B (Statistical and Sarah Hetrick. 2017. Serious games and gamification for mental health: Methodology) 61, 3 (1999), 611–622. current status and promising directions. Frontiers in psychiatry 7 (2017), 215. [14] Michel F. Valstar, Jonathan Gratch, Björn W. Schuller, Fabien Ringeval, Denis [7] Theresa M Fleming, Colleen Cheek, Sally N Merry, Hiran Thabrew, Heather Lalanne, Mercedes Torres, Stefan Scherer, Giota Stratou, Roddy Cowie, and Bridgman, Karolina Stasiak, Matthew Shepherd, Yael Perry, and Sarah Hetrick. Maja Pantic. 2016. AVEC 2016 - Depression, Mood, and Emotion Recognition 2014. Serious games for the treatment or prevention of depression: a systematic Workshop and Challenge. CoRR abs/1605.01600 (2016). arXiv:1605.01600 http: review. (2014). //arxiv.org/abs/1605.01600 [8] Seymour Geisser. 1993. Predictive inference. Vol. 55. CRC press. [15] Vladimir Naumovich Vapnik. 2000. The Nature of Statistical Learning Theory, Second Edition. Springer.