Emotional Mario Task at MediaEval 2021 Mathias Lux1 , Michael Riegler2 , Steven Hicks2 , Duc-Tien Dang-Nguyen3,4 , Kristine Jorgensen3 , Vajira Thambawita2 , Pål Halvorsen2 1 Alpen-Adria Universität Klagenfurt, Austria 2 SimulaMet, Norway 3 University of Bergen, Norway 4 Kristiania University College, Norway mlux@itec.aau.at,michael@simula.no,steven@simula.no,ductien.dangnguyen@uib.no,kristine.jorgensen@uib.no vajira@simula.no,paalh@simula.no ABSTRACT Video games are often understood as engines of experience, and the interaction with the game lets players consume carefully con- structed experiences. While it is generally agreed upon that a good experience makes a good game, methods for measuring or observ- ing the impact of the gameplay on the players’ experience are still an open problem. In the 2021 Emotional Mario task, we ask re- searchers to investigate the gameplay of ten study participants on one of the most iconic classic video games: Super Mario Bros. We provide data to learn from, including heart rate, skin conductiv- ity, videos of the players’ faces synchronized to the gameplay, the gameplay itself, and player demographics including their scores Figure 1: Collage of the first frames of all 32 Super Mario Bros. and times spent in the game. Participants of the task are asked to levels organized in eight worlds with four levels each. predict gameplay events based on the biometric and facial data of the players. demographics, biomedical, sensory input from a medical-grade device, and videos of their faces while playing the game. 1 INTRODUCTION 2 TASK DESCRIPTION With the rise of deep learning, several large leaps in research have The goal in the Emotional Mario task is to relate data about the been achieved in recent years such as human-level image recogni- players, e.g., their heart rate, skin conductivity, or their facial expres- tion, text classification, and even content creation. Games and deep sions, to the gameplay and events like Mario losing a life, finishing a learning also have a relatively long history together, specifically in level, or gaining a power-up by consuming a mushroom. Emotional reinforcement learning. However, video games still pose a lot of Mario is structured into two subtasks: challenges. Games are understood as engines of experience [9], and • In the first subtask, we asked participants to identify events as such, they need to invoke human emotions. While emotion recog- of high significance in the gameplay by just analyzing the nition has come a far way over the last decade [7], the connection facial video and the biometric data. Such significant events between emotions and video games is still an open and interesting include the end of a level, a power-up or extra life for Mario, research question. As games are designed to evoke emotions [9], we or Mario’s death. hypothesize that emotions in the player are reflected in the visuals • For the second subtask, which was optional, we asked of the video game. Simple examples are when players are happy participants to create a video summary of the best moments after having mastered a particularly complicated challenge, when of the play. This can include gameplay scenes, facial video, they are shocked by a jump scare scene in a horror game, or when data visualization, and whatever can help such a summary. they are excited after unlocking a new resource. Questionnaires can measure these things after playing [1], but in the Emotional 3 DATASET Mario task, we want to interconnect emotions and gameplay based on data instead of asking the players. The task provides a dataset of videos and sensor readings of people For the Emotional Mario challenge, we focus on the iconic Super playing Super Mario Bros [8]. In total, a population of ten people Mario Bros. video game and provide a multimodal dataset based was selected for data gathering, ranging from gaming veterans on a Super Mario Bros. implementation for OpenAI Gym [2]. For to novice players, with an even split between male and female a population of ten players, the dataset contains their game input, participants. Each participant provided a written form of consent, allowing for their video, gameplay data, and sensor data to be shared openly for research and teaching purposes under a Creative Copyright 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). Commons Attribution-NonCommercial 4.0 International License1 . MediaEval’21, December 13-15 2021, Online 1 http://creativecommons.org/licenses/by-nc/4.0/,accessed2020-11-04 MediaEval’21, December 13-15 2021, Online M. Lux et al. The dataset can be accessed via https://datasets.simula.no/toadstool Table 1: Best evaluation results per group per evaluation or https://osf.io/qrkcf/. and the averaged random baseline for matching event times- For each participant, a range of multimodal data was recorded tamps in the range of +/- 5 seconds and included in the dataset: • a video file recording the participants face with a 1.3-MP Team Precision Recall F1 score webcam with 30fps and 640x480 pixels, DCU-Gamestreamer 0.0021 0.8903 0.0041 • the controller input performed on each frame of the game GSE-AAU 0.0242 0.0812 0.0373 utilizing a wired USB controller from retro-bit, which is Random 0.2847 0.2847 0.2947 modeled after the original controller for the Nintendo En- tertainment System, • sensor data collected from an Empatica E4 wristband [4] Table 2: Best evaluation results per group per evaluation and including heart rate, temperature, skin conductivity, and the averaged random baseline for matching events in the accelerometer data, and range of +/- 5 seconds • video game action files, which are scripts to generate the video game frames. Team Precision Recall F1 score Besides the actual data, the provided dataset includes documen- DCU-Gamestreamer 0.0014 0.5709 0.0028 tation and process description as well. Additional data ranges from GSE-AAU 0.0112 0.0849 0.0197 the original questionnaire presented to the participants and their Random 0.0667 0.0667 0.0667 answers, the consent form signed by the participants, the license, and a README.txt file detailing the use of the dataset. A detailed description of the dataset is given in [8]. randomly in the range of actual number of events +/- 50%2 . Table 1 In addition to the dataset, we provide (i) ground truth data for and Table 2 give an overview of the best results of each group as the events in the game for 7 out of 10 participants (the remaining well as the averaged random baseline. three are used for the test set), and (ii) results from an automated We expected few submissions for the second subtask and wanted facial expression recognition package [3] including a confidence to employ a qualitative, heuristic evaluation. An expert panel with value for the basic emotions anger, disgust, fear, happiness, sadness, professionals and researchers from the field of game development, and surprise, as well as a neutral expression along with a bounding game studies, e-sports, and media sciences should have investigated box for the detected face. Examples are shown in Fig. 2. the submissions and judged them for: (1) Informative value (i.e., is it a good summary of the game- play), (2) Accuracy (i.e., does it reflect the emotional up and downs and the skill of the play), and (3) Innovation (i.e., surprisingly new approach, non-linearity of the story, creative use of cuts, etc.) Unfortunately, we did not receive submissions for the second sub- task. 5 DISCUSSION AND OUTLOOK Figure 2: Sample output from the facial expression recogni- While the MediaEval Emotional Mario task is the spiritual successor tion algorithm with bounding box and prediction. Videos of the Gamestory task [5, 6], the goals are different. The work taken from the Toadstool data set [8]. on Counter-Strike: Global Offensive and the analysis of the game streaming and e-sports phenomena have shown the substantial impact games have on culture and society. With the availability of biometric sensors and deep learning for data analysis, we re-focus 4 EVALUATION on the interrelation of the game and the player’s experience. With the Emotional Mario task, we hope to outline the direction The evaluation of the task is two-fold. For the first subtask, we of research where player-game interaction can be extended, and collect the participants’ output on finding events for the missing games as engines of experience can be understood. Games are not participants. We investigated the precision, recall, and f1 score of the only a playground for people. They are also a vast resource for events. We ran four different types of evaluations. We investigated research and future developments. if participants found the events in a range of +/- one second of the actual events and did the same for +/- five seconds. Two more ACKNOWLEDGMENTS evaluations were done focusing on the time of the event, discarding the type. A random baseline was created by simulating runs with We’d like to thank Dr. Andreas Leibetseder for his support. randomized events. The random baseline is biased by the knowledge 2 Evaluation scripts and creation of the random baseline can be found on https://github. of how many events are expected and chooses the number of events com/dermotte/EmotionalMarioEvaluation Emotional Mario: A Games Analytics Challenge MediaEval’21, December 13-15 2021, Online REFERENCES [1] Vero Vanden Abeele, Katta Spiel, Lennart Nacke, Daniel Johnson, and Kathrin Gerling. 2020. Development and validation of the player ex- perience inventory: A scale to measure player experiences at the level of functional and psychosocial consequences. International Journal of Human-Computer Studies 135 (2020), 102370. [2] Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. 2016. OpenAI Gym. arXiv preprint arXiv:1606.01540 (2016). [3] Justin Shenk et al. 2021. Facial Expression Recognition with a deep neu- ral network as a PyPI package. (2021). https://github.com/justinshenk/ fer [4] Maurizio Garbarino, Matteo Lai, Dan Bender, Rosalind W Picard, and Simone Tognetti. 2014. Empatica E3—A wearable wireless multi-sensor device for real-time computerized biofeedback and data acquisition. In Proceedings of the 4th International Conference on Wireless Mobile Communication and Healthcare-Transforming Healthcare Through In- novations in Mobile and Wireless Technologies (MOBIHEALTH). IEEE, 39–42. [5] Mathias Lux, Michael Riegler, Duc-Tien Dang-Nguyen, Marcus Lar- son, Martin Potthast, and Pål Halvorsen. 2018. GameStory Task at MediaEval 2018.. In Proceedings of MediaEval. [6] Mathias Lux, Michael Riegler, Duc-Tien Dang-Nguyen, Johanna Pirker, Martin Potthast, and Pål Halvorsen. 2019. GameStory Task at Media- Eval 2019.. In Proceedings of MediaEval. [7] Anvita Saxena, Ashish Khanna, and Deepak Gupta. 2020. Emotion recognition and detection methods: A comprehensive survey. Journal of Artificial Intelligence and Systems 2, 1 (2020), 53–79. [8] Henrik Svoren, Vajira Thambawita, Pål Halvorsen, Petter Jakobsen, Enrique Garcia-Ceja, Farzan Majeed Noori, Hugo L Hammer, Mathias Lux, Michael Alexander Riegler, and Steven Alexander Hicks. 2020. Toadstool: a dataset for training emotional intelligent machines play- ing Super Mario Bros. In Proceedings of the ACM Multimedia Systems Conference (MMSYS). 309–314. [9] Tynan Sylvester. 2013. Designing games: A guide to engineering experi- ences. " O’Reilly Media, Inc.".