A Proposal for Adapting Robot Behaviours Using Fuzzy Q-learning in Cognitive Serious Game Scenarios Eleonora Zedda1,* , Fabio Paternò1 1 HIIS Laboratory-Institute of Information Science and Technologies "Alessandro Faedo" (ISTI-CNR), Via Giuseppe Moruzzi, 1, 56127 Pisa PI, Italy Abstract The repetitive and monotonous character of cognitive training may lead to waning interest and eventual disengagement among older adults with cognitive impairments. To address this issue, this study proposes an adaptive approach wherein a Socially Assistive Robot (SAR) autonomously selects optimal actions to sustain an emotional state in older adults while participating in serious cognitive training games. The aim is to propose an adaptation strategy that leverages fuzzy Q-learning to prompt users to maintain a positive state. Keywords Human-Robot Interaction, Robot behaviour adaptation, Socially Assistive Robots, Fuzzy Q-Learning, 1. Introduction municate through its output modalities (e.g., speech and communicate through its output modalities (e.g., speech Over the past two decades, numerous research efforts and gesture generation). Previous research has predom- have delved into innovative interaction technologies to inantly focused on devising adaptive strategies for ex- improve older individuals’ mental and physical well- ploring robotic dialogue techniques and robot commu- being. In this context, there has been a growing inter- nication atmosphere using Reinforcement Learning (RL) est towards integrating robots into social environments, or using Fuzzy Q-learning [2, 8, 9]. Our investigation intended to help individuals in both professional and aims to identify the appropriate robot behavioural strate- domestic contexts, specifically in cognitive training. In gies, encompassing verbal and non-verbal parameters, particular, it has been observed that exposure to social by infusing specific personalities into a socially assistive and cognitive stimuli can significantly strengthen the psy- robot (SAR) interacting with older adults suffering from chological well-being of the elderly and, at the same time, mild cognitive impairment (MCI). The rationale for this mitigate the dangers of social isolation, a phenomenon emphasis lies in existing literature [10, 11, 12], which with harmful effects on the health of the elderly, which suggests that customizing and adapting the SAR system can lead to a ’high susceptibility to conditions such as can produce more effective and engaging human-robot dementia[1]. To facilitate natural interaction, researchers interaction (HRI), especially within a vulnerable demo- in social robotics have focused on robots that can adapt graphic such as older adults with MCI. The proposed to diverse conditions and different user needs [2]. Ma- strategy combines the potential of Q-learning (QL) and chine learning techniques for adaptable social robots that of Fuzzy Logic, which allows the management of a have recently attracted a lot of attention [3] [4] [5] [6] fuzzy number of states. In this proposal, the user state [7]. Adaptive robot interactions are essential to provide is not limited to a small and discrete set of states in the comfortable, effective, and affordable interactions with serious game scenario, as in RL. With the potentiality humans. An adaptive behaviour system would facili- of fuzzy logic, the set of states is more generalized and tate meaningful, effective communication and interaction. fuzzy to reflect a user’s natural and nuanced state during Additionally, it can create a more trusting relationship be- the interaction with the robot. In addition, fuzzy logic tween the user and the robots [6, 8]. To enable machines is well known for its successful application to uncertain to interact with users naturally, the system must be able environments[9]. Thus, providing adaptive behaviour to to identify or recognize the state of human behaviour and the robot is crucial for this particular population, as high- performance through the input modalities (e.g., speech lighted by previous studies, in which both older adults recognition, emotion detection, gestures, etc.) and com- individuals and caregivers expressed a preference for more natural and responsive interaction with the robot Workshop Robots for Humans 2024, Advanced Visual Interfaces, Aren- during cognitive training sessions [1, 13, 14]. zano, June 3rd, 2024 * Corresponding author. $ eleonora.zedda@isti.cnr.it (E. Zedda); fabio.paterno@isti.cnr.it (F. Paternò) 2. Approach and Motivation  0000-0002-6541-5667 (E. Zedda); 0000-0001-8355-6909 (F. Paternò) Reinforcement Learning is a fundamental learning © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). paradigm within machine learning[15]. It offers a stan- CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings dardized framework for developing agents capable of adaptation policy will dictate the appropriate course of acquiring optimal behaviour within uncertain environ- action for the robot based on the user’s current state ments. Within RL, agents operate without direct access during a serious game. to an optimal control strategy, relying instead on instan- taneous reinforcement signals. Most RL algorithms[16] 2.1.1. Cooking Serious Game Scenario. commonly depict the action-value function using a look- up table, allocating a singular entry for each state-action The cooking game involves eight questions that chal- pairing. While this method boasts robust theoretical un- lenge users to identify the correct sequence and weight derpinnings [15, 17] and proves effective across various of the ingredients. The serious game is divided into five applications, its utility is severely constrained when con- stages: introduction, recipe instruction, question state, fronted with problems featuring extensive state spaces or answer state and ending feedback. At the start of the continuous domains, owing to the phenomenon referred application, the robot greets the user and asks if they to as the curse of dimensionality[16]. To address this chal- are prepared to play. When the cooking game starts, the lenge, numerous methodologies attempt to mitigate such robot shows and vocally synthesises the ingredients for limitations by employing function approximation tech- the selected recipe. The robot emphasises the sequential niques such as Fuzzy Logic, thereby approximating the ingredients’ order and weight during the recipe instruc- action-value function using limited parameters [18, 19]. tion. The quizzes follow, in which the user must use Consequently, the agent’s exposure is confined to a set visual attention and working memory to identify the of the state space, whereby, through the generalisation correct ingredients and choose them from the available mechanism, it can yield a satisfactory approximation options. The user interacts with the game using voice across a broader expanse of the state space. To address modality. The cooking game topic was chosen consid- such challenges, a promising approach is to model the ering the previous experiences of the psychologists and adaptation simulation using Fuzzy Q-learning (FQL). FQL psychotherapist experts in cognitive training with MCI is an extension of the original Q-learning proposed in users [14, 22? ]and a literature analysis [23, 24]. After a Reference [17] that uses fuzzy parameters [20, 21]. In- semi-structured interview with three psychologists and troducing Fuzzy Q-learning allows us to consider more one neuroscience researcher of CNR Pisa, it was decided prominent and variable ranges of values than the choice to design a cooking application connected to the user’s of a specific value. This adds more variability and rep- daily activity, requiring users to recognise the ingredi- resentation in the simulated users. We propose a Fuzzy ents’ chronological sequence, weight, and typology of RL technique to support an adaptation strategy for the ingredients. robot, composed of verbal and nonverbal parameters while exhibiting specific personalities, in the context of 2.1.2. Robot Personality. an application for cognitive training [22]. The robot in our scenario is a Pepper robot and exploits two personalities: extraverted and introverted one [25]. 2.1. Key Element in the Proposal Typically, extroverts tend to speak in a louder, faster, and higher-pitched manner. They are also more inclined to The adaptive strategy is applied to a cognitive training initiate conversations and talk more about themselves scenario in a serious cooking game. The robot mani- than others. Regarding body language, their gestures and fests an extraverted or introverted personality during movements are generally more expansive and faster and the interaction between the user and the cooking game. occur more frequently than introverted ones. Conversely, The choice of this personality, made before the train- for the introverted condition, the robot’s gestures tend ing session, modifies the robot’s behaviours following to be more limited, contained, and slower in such a way pre-established parameters. The adaptation algorithm to appear reserved toward the user. determines which actions, associated with the current personality, will be selected based on the maximum value that maps the user’s current state and the action. In 2.1.3. Q-learning. the following scenario, there are three potential actions In the Q-learning algorithm, interactions consist of a se- that the robot can take. Suppose the user is in an ex- quence of user states (𝑆), robot actions (𝐴), and rewards tremely positive state (e.g. positive emotion, high atten- (𝑅) (see Figure1). The RL agent observes the state 𝑆𝑡 from tion and positive trend of game performances). In that the environment and chooses an action 𝐴𝑡 to perform case, the robot can react appropriately and enthusiasti- based on this observation. The chosen action is then exe- cally to maintain the user’s engagement. Alternatively, cuted, and the environment provides a new state, 𝑆𝑡+1 , if the user appears in a negative state, the robot should and a reward, 𝑅𝑡+1 , to evaluate the transition. Given the use inciting behaviour to encourage the user to remain current state, the agent’s policy (𝑄 − 𝑡𝑎𝑏𝑙𝑒) maps states attentive and attempt to re-engage with the robot. The to actions and determines which action to select. for an adaptive robot behaviour generation that main- tains a positive user state through interaction with the robot, we define the key elements of the Fuzzy Q-learning algorithm, described in the following section. Figure 1: Reinforcement learning standard framework ap- plied to HRI scenario. 2.1.4. Fuzzy Logic System. Fuzzy logic is a mathematical approach to emulate the human way of thinking and learning [26]. A Fuzzy Logic systems depend on prior expert knowledge to specify the fuzzy sets and fuzzy rule bases. A fuzzy logic system consists of three components: fuzzification, fuzzy logic controller, and defuzzification [20]. The first component converts the crisp values (i.e. not fuzzy values) of the input variable that defined the user state (user’s emotion, Figure 2: Fuzzy Reinforcement learning model proposed. attention, and game performance) into their fuzzy form using some membership functions. A membership function is a curve that defines how each point in the input space is mapped to a membership value (or degree 2.2. Modelling an adaptive Fuzzy of membership) between 0 and 1. The input variables Q-learning application are assigned to the linguistic variable (i.e. variables with In our Fuzzy Q-learning-based model proposal, the user a value of linguistic concepts rather than numbers, e.g. state is considered the environment, while the robot is High attention, Low attention, Intermediate attention). the agent (see Figure 2). The fuzzy output of each input variable can take one State. Three variables describe the user’s state: the user’s candidate value from a set of defined values [27]. In emotion, the user’s attention, and the user’s game per- our simulations, three user models were created (one formance. The user emotions considered are seven basic for healthy users, one for MCI, and one for users with emotions: happiness, neutral, sad, surprise, fear, disgust, dementia) to tailor the adaptation. The user model varies and anger, identified using physiological parameters col- according to the cognitive level of the user, the user’s lected by the Empatica E4 wristband [28]. The user’s performance and the user’s attention. attention is extrapolated, collecting the user’s gaze direc- tion using the Qisdk library of the Pepper robot and the The second one simulates human reasoning by making user’s performance considering the trend of correct and fuzzy inferences based on inputs and a pre-defined set of wrong answers during the session with the robot[29]. IF-THEN rules. The rules allow us to define the relation- All these environment information that identified the ship between the input (state) and the output (action). user state are fuzzified. The triangular and trapezoidal We define the rules using our prior knowledge based on membership functions are used to fuzzify the parameters our experience with MCI and an experiment with robot and identify the linguistic variables: high, low, and in- adaptation behaviour using Q-learning. The fuzzy logic termediate user states that vary from extremely low to component’s output is an algebraic product of the degree extremely high user-level states. of truth of each fuzzy input defined in the fuzzification Action. The actions identified correspond to the be- phase. The last component helps to convert the fuzzy havioural responses exhibited by the robot, encompass- output set from the linguistic variables into a crisp value. ing verbal feedback, vocal characteristics, animations, The fuzzy output of the system can take one value from a and motor movements tailored to the supported robot set of 7 constant values from extremely low to extremely personalities. The robot can execute three distinct ac- high in a range between i.e. [-5,+5]. tions within the serious gaming context. Firstly, (a0) en- In summary, Fuzzy Logic resembles the human tails generating an enthusiastic, more spirited behaviour. decision-making methodology and deals with vague and For instance, the robot manifests enthusiastic feedback imprecise information. It is an approach to computing within the extravert personality paradigm, exemplified based on "degrees of truth" rather than the usual "true by phrases such as "My gosh! That is the correct one! or false" (1 or 0) Boolean logic. According to our goal You are trying hard!" This is accompanied by a slight aug- the tablet, while other directions denote a "distracted mentation in speech rate, volume, and pitch, alongside state". These defined parameters are captured during dynamic and expansive animations featuring pronounced interactions with the robot using modules provided by motor movements. Secondly, (a1) pertains to generating the QiSDK robot framework. Data collection intervals a more inciting behaviour, typified by phrases like "right are set at every one second to ensure an adequate answer! Let us continue with this focus!" and animations dataset for assessing attentive states[29]. Regarding characterized by closer proximity. Finally, (a2) involves emotion recognition, the Empatica E4 wristband will the generation of a more neutral robotic comportment. be used. The emphatic E4 has four sensors, including An instance is provided by the utterance "Good! That an Electrodermal activity sensor (𝜇𝑆 ), a heart rate is the right answer!" delivered with vocal parameters sensor (beats per minute), skin temperature (C), a 3-axis reflecting neutrality as per the robot’s personality con- accelerometer on three orthogonal axes (X, Y, Z), and an figuration, alongside basic animations consonant with optical thermometer[28]. The E4 wristband collected the its inherent persona. heart rate at a frequency of 1 Hz, the skin temperature Reward The robot must avoid the user’s descent into and EDA at 4 Hz, and the acceleration at 32 Hz[28]. negative states (e.g. negative emotion, negative perfor- Regarding emotion classification, various machine mance and low attention). The selection of the reward learning algorithms were used to recognize emotion weights aims to optimise the robot behaviour policy, using physiological data. A possible suitable algorithm thereby maintaining the user in a positive state while for emotion recognition identified by literature could facilitating a positive trajectory in performance. This be- be the Vector Machine (SVM), supervised learning comes particularly salient in repetitive cognitive training, algorithms that offer promising results in high emotion where task monotony prevails. Emphasizing a more au- recognition accuracy [30, 31]. Regarding the trend of thentic and captivating robot behaviour centred around user performance during the gameplay, the serious game the user’s state and performance is important in sustain- collects different values such as the number of correct ing user engagement and active participation within this answers, the number of tentative answers right, reaction context. Tailoring more adapting robot behaviours for time, session time, and other parameters related to the this demographic assumes significance, as it enhances game. Following the European regulation about privacy user acceptability. Consequently, users are afforded a and data, these are saved into a CNR database. The robot heightened sense of comfort and focus during training and wristband will determine the user’s current state, sessions, fostering a more enjoyable and effective inter- combining these values with the game data collected by action facilitated by incorporating human-like behaviour the serious game implemented in Android. In parallel, or actions within the robot’s repertoire. the Android application manages the robot’s data The system illustrated in Figure 2 will be tested in a sensing and the wristband data collection. In contrast, simulated environment using Python with the three-user the Fuzzy logic module and the emotion recognition model defined and then implemented in the Pepper robot algorithm will be implemented in Python. for a real-user test. After the training phase, which simulates the fuzzy logic system with the three user profiles defined, a real user 3. Robot Adaptive Application test will be held in a laboratory setting at the CNR of Pisa. The users will test a robot in a within-subject The humanoid robot that will be employed for the real study design in a random condition with an adaptive user test is the Pepper model, developed by Softbank’s condition. The goal of the test will be to identify if Robotics. Pepper is 1.2 meters tall and has 17 joints the user can perceive an adaptation of the robot’s to facilitate expressive body language, alongside three behaviour concerning the random condition and if the omnidirectional wheels to facilitate mobility. With a User Engagement in the adaptation condition is higher suite of multimodal interfaces, including touchscreen, with respect to the random one. To evaluate these speech, tactile head, hands, bumper, LEDs, and 20 de- hypotheses, the user will compile the User Engagement grees of freedom for whole-body motion, Pepper offers Scale [32], and Godspeed questionnaires [33] at the end a versatile platform for interaction. In this proposal, of each robot condition interaction. the camera sensors are leveraged to detect various user attention states, in particular, by categorizing the gaze direction. Gaze direction is categorized into five 4. Conclusions values: (1) direct eye contact with the robot, (2) looking upward, (3) focusing on the tablet, (4) looking left, and (5) This paper describes a proposal of a Fuzzy RL algorithm looking right. The user is deemed to be in an "attention to learn the best policy strategy in a SAR running two state" when directing their gaze toward the robot or personalities that help to cope in the context of cognitive training individuals with MCI, healthy and with dementia. feedback, in: Proceedings of the 12th ACM interna- The proposed model allows the robot’s behavioural sys- tional conference on PErvasive technologies related tem to select an appropriate action based on the state of to assistive environments, 2019, pp. 247–255. the fuzzified user. Fuzzy logic enables the circumvention [8] C. Pou-Prom, S. Raimondo, F. Rudzicz, A conversa- of the constraints inherent in Q-learning by facilitating tional robot for older adults with alzheimer’s dis- the selection of a set of overarching and generic states. In ease, ACM Transactions on Human-Robot Interac- the proposed model, the possible actions that the robot tion (THRI) 9 (2020) 1–25. performs to stimulate have been identified to increase [9] L.-F. Chen, Z.-T. Liu, M. Wu, F.-Y. Dong, Y. Yamazaki, user engagement during a serious gaming scenario. The K. Hirota, Multi-robot behavior adaptation to local designated actions follow the parameters defined for the and global communication atmosphere in humans- extroverted and introverted personalities introduced in robots interaction, Journal on Multimodal User section 2.1.2. In future work, we want to implement the Interfaces 8 (2014) 289–303. proposed Fuzzy Q-learning model and validate it by eval- [10] A. Tapus, Improving the quality of life of people uating the metrics, such as average reward, average step with dementia through the use of socially assistive to complete the task, and average computation time in robots, in: 2009 Advanced Technologies for En- the simulation phase[18] and then test the system with hanced Quality of Life, IEEE, 2009, pp. 81–86. real users. Another step is identifying the best machine- [11] J. Magyar, M. Kobayashi, S. Nishio, P. Sinčák, learning algorithm for classifying user emotions collected H. Ishiguro, Autonomous robotic dialogue system by the Empatica wristband. In conclusion, this article with reinforcement learning for elderlies with de- describes a proposal to provide an adaptation policy for mentia, in: 2019 IEEE International Conference on robot actions that adapt to fuzzy user states. Systems, Man and Cybernetics (SMC), IEEE, 2019, pp. 3416–3421. [12] H. Modares, I. Ranatunga, F. L. Lewis, D. O. Popa, References Optimized assistive human–robot interaction us- ing reinforcement learning, IEEE transactions on [1] F. Carros, J. Meurer, D. Löffler, D. Unbehaun, cybernetics 46 (2015) 655–667. S. Matthies, I. Koch, R. Wieching, D. Randall, [13] E. Zedda, M. Manca, F. Paterno, C. Santoro, Mci M. Hassenzahl, V. Wulf, Exploring human-robot in- older adults’ user experience with introverted and teraction with the elderly: results from a ten-week extraverted humanoid robot personalities, in: Pro- case study in a care home, in: Proceedings of the ceedings of the 15th Biannual Conference of the 2020 CHI Conference on Human Factors in Com- Italian SIGCHI Chapter, 2023, pp. 1–12. puting Systems, 2020, pp. 1–12. [14] M. Manca, F. Paternò, C. Santoro, E. Zedda, [2] N. Akalin, A. Loutfi, Reinforcement learning ap- C. Braschi, R. Franco, A. Sale, The impact of seri- proaches in social robotics, Sensors 21 (2021) 1292. ous games with humanoid robots on mild cognitive [3] S. Keizer, M. Ellen Foster, Z. Wang, O. Lemon, Ma- impairment older adults, International Journal of chine learning for social multiparty human–robot Human-Computer Studies 145 (2021) 102509. interaction, ACM transactions on interactive intel- [15] R. S. Sutton, Learning to predict by the methods ligent systems (TIIS) 4 (2014) 1–32. of temporal differences, Machine learning 3 (1988) [4] J. De Greeff, T. Belpaeme, Why robots should be 9–44. social: Enhancing machine learning through so- [16] R. S. Sutton, A. G. Barto, Reinforcement learning: cial human-robot interaction, PLoS one 10 (2015) An introduction, MIT press, 2018. e0138061. [17] C. J. Watkins, P. Dayan, Q-learning, Machine learn- [5] J. Hemminghaus, S. Kopp, Towards adaptive social ing 8 (1992) 279–292. behavior generation for assistive robots using re- [18] A. Bonarini, Evolutionary learning, reinforcement inforcement learning, in: Proceedings of the 2017 learning, and fuzzy rules for knowledge acquisition ACM/IEEE International Conference on Human- in agent-based systems, Proceedings of the IEEE Robot Interaction, 2017, pp. 332–340. 89 (2001) 1334–1346. [6] K. Tsiakas, M. Dagioglou, V. Karkaletsis, F. Make- [19] R. E. Bellman, L. A. Zadeh, Decision-making in a don, Adaptive robot assisted therapy using inter- fuzzy environment, Management science 17 (1970) active reinforcement learning, in: International B–141. Conference on Social Robotics, Springer, 2016, pp. [20] A. Bonarini, A. Lazaric, F. Montrone, M. Restelli, Re- 11–21. inforcement distribution in fuzzy q-learning, Fuzzy [7] H. Ritschel, A. Seiderer, K. Janowski, S. Wagner, sets and systems 160 (2009) 1420–1443. E. André, Adaptive linguistic style for an assistive [21] L. Zadeh, Fuzzy sets, Inform Control 8 (1965) 338– robotic health companion based on explicit human 353. [22] E. Zedda, M. Manca, F. Paternò, A cooking game for cognitive training of older adults interacting with a humanoid robot., in: CHIRA, 2021, pp. 271–282. [23] V. Manera, P.-D. Petit, A. Derreumaux, I. Orvieto, M. Romagnoli, G. Lyttle, R. David, P. H. Robert, ‘kitchen and cooking,’a serious game for mild cog- nitive impairment and alzheimer’s disease: a pilot study, Frontiers in aging neuroscience 7 (2015) 24. [24] S. McCallum, C. Boletsis, Dementia games: A lit- erature review of dementia-related serious games, in: Serious Games Development and Applications: 4th International Conference, SGDA 2013, Trond- heim, Norway, September 25-27, 2013. Proceedings 4, Springer, 2013, pp. 15–27. [25] E. Zedda, M. Manca, F. Paternò, C. Santoro, Older adults’ user experience with introvert and extravert humanoid robot personalities, Universal Access in the Information Society (2023) 1–17. [26] R. Sharma, M. Gopal, A markov game-adaptive fuzzy controller for robot manipulators, IEEE Trans- actions on Fuzzy Systems 16 (2008) 171–186. [27] R. D. Hegazy, O. A. Nasr, A user behavior based han- dover optimization algorithm for lte networks, in: 2015 IEEE Wireless Communications and Network- ing Conference (WCNC), IEEE, 2015, pp. 1255–1260. [28] E. Inc, E4wristband: Real-time physiological sig- nals: Wearable ppg, eda, temperature, motion sen- sors., Empatica.com (last visit: 12/14/2024). URL: https://www.empatica.com/en-eu/research/e4/. [29] Softbank Robotics, Human, QiSDK (2019). URL: https://qisdk.softbankrobotics.com/sdk/doc/ pepper-sdk/ch4_api/perception/reference/human. html#retrieving-characteristics. [30] M. Ragot, N. Martin, S. Em, N. Pallamin, J.-M. Di- verrez, Emotion recognition using physiological signals: laboratory vs. wearable sensors, in: Ad- vances in Human Factors in Wearable Technolo- gies and Game Design: Proceedings of the AHFE 2017 International Conference on Advances in Hu- man Factors and Wearable Technologies, July 17-21, 2017, The Westin Bonaventure Hotel, Los Angeles, California, USA 8, Springer, 2018, pp. 15–22. [31] L. Shu, J. Xie, M. Yang, Z. Li, Z. Li, D. Liao, X. Xu, X. Yang, A review of emotion recognition using physiological signals, Sensors 18 (2018) 2074. [32] H. L. O’Brien, P. Cairns, M. Hall, A practical ap- proach to measuring user engagement with the re- fined user engagement scale (ues) and new ues short form, International Journal of Human-Computer Studies 112 (2018) 28–39. [33] C. Bartneck, D. Kulić, E. Croft, S. Zoghbi, Mea- surement instruments for the anthropomorphism, animacy, likeability, perceived intelligence, and per- ceived safety of robots, International journal of social robotics 1 (2009) 71–81.