Assessing Emotion Mitigation through Robot Facial
                         Expressions for Human-Robot Interaction
                         Luigi D’Arco1,* , Alessandra Rossi1 and Silvia Rossi1
                         1
                          Department of Electrical Engineering and Information Technologies, University of Naples Federico II, Via Claudio 21, 80125
                         Naples, Italy


                                     Abstract
                                     Affective responses are one of the primary and clearer signals used by agents for communicating their internal state.
                                     These internal states can represent a positive or negative acceptance of a robotic agent’s behavior during a human-
                                     robot interaction (HRI). In these scenarios, it is fundamental for robots to be able to interpret people’s emotional
                                     responses and to adjust their behaviors accordingly, to appease them, and to provoke an emotional change in
                                     them. This research investigates the impact of robot facial expressions on human emotional experiences within
                                     HRI, focusing specifically on whether a robot’s expressions can amplify or mitigate users’ emotional responses
                                     when viewing emotion-eliciting videos. To evaluate participants’ emotional states, an AI-based multimodal
                                     emotion recognition approach was employed, combining analysis of facial expressions and physiological signals,
                                     complemented by a self-assessment questionnaire. Findings indicate that participants responded more positively
                                     when the robot’s facial expressions aligned with the emotional tone of the videos, suggesting that emotion-
                                     coherent displays could enhance user experience and strengthen engagement. These results underscore the
                                     potential for expressive social robots to influence human emotions effectively, offering promising applications in
                                     therapy, education, and entertainment. By incorporating emotional facial expressions, socially assistive robots
                                     could foster behavior change and emotional engagement in HRI, broadening their role in supporting human
                                     emotional well-being.

                                     Keywords
                                     Emotion elicitation, Socially Assistive Robotics, Human-Robot Interaction, Emotion Recognition


                         1. Introduction
                         Socially Assistive Robotics (SAR) is an emerging field of robotics that focuses on developing robots
                         that can assist users with hands-off interaction strategies, providing emotional and cognitive assistance
                         [1, 2]. To improve the Human-Robot Interaction (HRI) experience, SARs must be capable of interpreting,
                         mimicking, and responding to emotional cues, with facial expressions being a primary mode of emotional
                         communication. This ability is essential when robots are used in contexts where emotional engagement
                         can facilitate positive outcomes, such as therapy, learning, and behavior change. In human-human
                         communication, facial expressions are critical for conveying emotions, improving understanding, and
                         guiding social interactions. Several studies showed that facial expressions not only reflect how a person
                         is feeling but also influence how others feel [3]. This phenomenon, known as emotional contagion [4],
                         suggests that emotions can spread from one person to another through non-verbal cues, influencing
                         the emotional state of the observer. If SARs are to be effective in emotionally engaging users, they must
                         be able to use facial expressions in ways that influence the user’s emotional experience, particularly
                         in situations where emotional states can influence behavior and decision-making. Staffa et. al [5]
                         investigated whether positive or negative robot personalities can affect the mental state of users
                         during HRI by assessing the Electroencephalogram (EEG) signals of participants. They involved an
                         anthropomorphic robot with two engagement personalities, one more prone to engage the user and
                         the other not, by modeling voice, dialogues, and head and body movements. The results showed that
                         participants felt the robot’s personality, affecting their emotional state and engagement. Similarly,

                         Workshop on Advanced AI Methods and Interfaces for Human-Centered Assistive and Rehabilitation Robotics (a Fit4MedRob
                         event) - AIxIA 2024
                         *
                           Corresponding author.
                         $ luigi.darco@unina.it (L. D’Arco); alessandra.rossi@unina.it (A. Rossi); silvia.rossi@unina.it (S. Rossi)
                          0000-0001-7179-8281 (L. D’Arco); 0000-0003-1362-8799 (A. Rossi); 0000-0002-3379-1756 (S. Rossi)
                                     © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
Fiorini et al. [6] explored the impact of a robot’s behavior on the emotional state of users during
exposure to emotion-eliciting images. The robot displayed emotions that were coherent or incoherent
with those experienced by the user to assess the level of influence it could have on the user. The
results showed high accuracy, up to 98% in the robot recognizing 3 emotional states, including positive,
negative, and neutral, reporting that such states were better identified when the robot was not neutral
but performed coherent or incoherent behaviors. Rossi et al. [7] conducted a study to evaluate the
impact of non-verbal behaviors of an anthropomorphic robot on the emotional responses in users. By
using coherent, incoherent, and neutral behaviors the robot’s non-verbal cues were modeled using
emotional gestures. Findings revealed that emotional reactions with high arousal could be challenging
by using only emotional gestures, and additional interaction strategies are needed.
   In light of the different achievements in the literature, the impact of robot facial expressions on human
emotions during HRI has yet to be fully investigated. Hence, the present study focuses on assessing
whether the facial expressions of a robot can affect the mood of users while watching emotion-eliciting
videos. By displaying facial expressions that either match or contrast the user’s emotional state, the
robot could promote the general effect of mirroring or emotional contagion, whereby an observer
tends to covertly and unconsciously mimic the behavior of the person being observed [8]. The study
design is based on the approach outlined by Rossi et al. [7], with the modification of including only two
conditions: the robot’s facial expressions either align with the emotional content of the videos or display
opposing emotions. To evaluate the emotion felt by the user, an Artificial Intelligence (AI) approach
has been developed that predicts the users’ emotional state based on a fusion of facial expressions
and physiological signals. Furthermore, participants of the study were provided with a questionnaire
at the beginning to ascertain their empathetic capacity and one questionnaire at the end to evaluate
their perception of the robot’s emotional display. By demonstrating the potential of robots to influence
human emotions through facial expressions, the study can contribute to the development of SARs that
are more emotionally intelligent and capable of supporting users in emotionally meaningful ways,
unleashing their application in scenarios where emotional engagement can facilitate positive outcomes,
such as therapy, learning, and behavior change.


2. Materials and Methods
This study evaluates a robot’s ability to mitigate emotions in participants watching emotion-eliciting
videos. A multimodal emotion recognition system assessed participants’ emotional states through facial
expression and physiological signal analysis. Pre-experiment questionnaires evaluated participants’
empathy levels, while post-experiment questionnaires assessed their perceptions of the robot’s emotional
displays. The study was conducted in a controlled environment to ensure reliable results.

2.1. Robotic Agent and Sensing Elements
The robotic agent involved in this study is a Furhat robot [9], which is a human-like, rear-projected
robotic head that uses computer animations and neck movements to provide facial expressions [10].
The robot is equipped with a camera and a microphone to capture information from the surrounding
environment. However, due to the need to show emotion-eliciting videos from the laptop, the laptop
camera is used to have a straight view of the user’s face, better for identifying the emotion felt. Although
facial expression may be the most significant nonverbal form of emotional expression [3], some people
can mask their facial emotions by adopting a neutral expression and using non-intuitive human
body language that can lead to misinterpretations [11]. Therefore, a multimodal information-based
solution for emotion recognition has been pursued to produce a more reliable emotion recognition
system. Alongside the facial expressions, physiological signals have been considered, including the
Electrocardiogram (ECG) and Galvanic Skin Response (GSR) signals, which can be considered more
reliable indicators of emotions, as they are more difficult to cover or alter through human disguise [12].
These signals are acquired by the BITalino biosignal platform provided by PLUX Biosignals [13]. The
settings of the equipment involved in the study are shown in Figure 1.
Figure 1: Experimental settings. Example of a user wearing the BITalino biosignal platform while watching
emotion-eliciting videos and interacting with the Furhat robot.


2.2. Emotion Recognition Model
The emotion recognition model employed in this study builds upon a baseline architecture previously
established in [7]. The model was selected due to its proven effectiveness in recognizing emotions from
facial expressions and physiological signals. The model was trained using the AMIGOS dataset [14],
which contains multimodal data, including EEG, ECG, GSR, and facial video recordings. The dataset
provides affective level annotations for the participants based on the Self-Assessment Manikin (SAM)
scale [15] and evaluations made by the dataset’s authors. The annotations include valence, arousal,
dominance, and basic emotions (Neutral, Disgust, Happiness, Surprise, Anger, Fear, and Sadness)
for each participant for every video. The model is based on a multimodal approach that combines
facial expressions and physiological signals to predict the user’s emotional state. Each modality has
been processed individually by an artificial intelligence model, with the facial images processed by a
Convolutional Neural Network (CNN)-based architecture, and the physiological signals processed by a
Support Vector Machine (SVM) model. The predictions of the extracted emotions from each modality
are then fused to provide a final prediction. The model was trained on the AMIGOS dataset using a
train-test split approach of 70-30.

2.3. Emotion Elicitation Videos
The videos for emotion elicitation have been selected from the DECAF database [16], which is a
multimodal dataset for decoding user physiological responses to affective multimedia content. Videos
with a total length not exceeding 120 seconds have been chosen to avoid fatigue and to maintain the
participants’ attention, but also to ensure that only one emotion is elicited at a time. Three videos were
selected for each of the four emotional categories Low Arousal - Negative Valence (LALV), Low Arousal
- Positive Valence (LAHV), High Arousal - Negative Valence (HALV), and High Arousal - Positive
Valence (HAHV) based on the annotations provided in the DECAF dataset, resulting in a total of 12
videos. For instance, the scene from Bambi where “Bambi’s mother gets killed” was categorized under
LALV due to its emotionally distressing content, while the scene from Wall-E where “Wall-E and Eve
spend a romantic night together” was classified under LAHV to evoke positive but calm emotions.
Videos have been presented to the participants in random order to avoid any bias in the results.

2.4. Questionnaires
Two questionnaires have been prepared for the study, one to be completed before the experiment
and one after. The pre-experiment questionnaire aimed to collect demographic information about the
participants, such as age and gender, as well as information about their previous experience with robots
and their empathetic capacity. The empathetic capacity is assessed using the Empathy Quotient test
[17], which is a self-report questionnaire designed to measure empathy in adults. The short version of
the test has been chosen, which consists of 40 questions, where each question is scored on a scale from
0 to 2, with higher scores indicating higher levels of empathy. The test provides a total score that ranges
from 0 to 80. To distinguish participants’ level of empathy, four categories have been identified: low
empathy (0-20), medium-low empathy (21-40), medium-high empathy (41-60), and high empathy (61-80).
On the other hand, the post-experiment questionnaire aimed to evaluate the participant’s perception of
the robot’s emotional display during the experiment. The post-questionnaire included questions about
the robot’s facial expressions, the perceived emotions, and the impact of the robot’s expressions on the
participants’ emotional state. The post-questionnaire is created above the System Usability Scale (SUS)
score principles [18], and each item is scored on a scale of 1 to 5, with 1 being completely disagree and
5 being completely agree. The post-questionnaire results were scored with a range from 0 to 100, with
higher scores indicating a more positive perception of the robot’s emotional display.


3. Results
A total of 60 subjects, aged 18 to 34, voluntarily participated in the study. Participants included 34
males, 17 females, and 1 non-binary individual. Of these, 24 participants reported no prior experience
with robots. Two participants withdrew from the study before completing the session due to personal
commitments. The participants were randomly assigned to two groups: coherent (𝑛 = 29) and
incoherent (𝑛 = 29). This preliminary analysis aims to assess participants’ experiences and perceptions
of the robot, comparing post-experiment responses between the two groups. Statistical analyses were
conducted using the Student’s t-test for independent samples.
   Participants in the coherent group rated the robot’s behavior as more natural (𝜇 = 3.393, 𝜎 = 1.197)
than those in the incoherent group (𝜇 = 2.448, 𝜎 = 1.213), with a statistically significant difference
(𝑝 < 0.05). Similarly, participants in the incoherent group reported a higher discomfort level created by
the robot (𝜇 = 3.000, 𝜎 = 1.363) than those in the coherent group (𝜇 = 1.786, 𝜎 = 0.994, 𝑝 < 0.05).
However, both groups did not perceive a significant influence from the robot while watching the videos
(𝜇 = 2.579, 𝜎 = 1.224, 𝑝 > 0.05). Furthermore, participants in the coherent group perceived the robot
as more aware of the video content (𝜇 = 4.143, 𝜎 = 1.02) compared to those in the incoherent group
(𝜇 = 2.414, 𝜎 = 1.21). They also rated the robot as less incoherent relative to the video scenes presented
(𝜇 = 1.964 for the coherent group, 𝜇 = 3.793 for the incoherent group). The robot’s expressions were
more distracting for participants in the incoherent group (𝜇 = 3.000, 𝜎 = 1.15) than for those in the
coherent group (𝜇 = 2.357, 𝜎 = 1.07). For other aspects, such as the social acceptability of the robot
and its potential utility in communicating emotions, no significant differences were observed between
the two groups (𝑝 > 0.05), with both groups agreeing on the robot’s usefulness.
   Overall, these findings suggest that coherent emotional expressions in the robot enhance perceptions
of it as more natural, aware, and non-intrusive, whereas incoherent expressions increase perceptions of
discomfort, distraction, and incoherence. Although participants did not report feeling influenced by
the robot, future studies will explore potential unconscious emotional changes in participants using
emotion recognition models.


4. Conclusion
This preliminary study explores how robot facial expressions influence human emotional experiences
in HRI. In the experiment, participants watched emotion-eliciting videos while interacting with a
robot that displays facial expressions either aligned or misaligned with the emotional content of the
videos. The findings indicate that participants generally responded positively to interactions where the
robot’s expressions matched the emotional content of the videos, underscoring the potential of using
facial expressions in SARs to enhance user engagement. This study lays a foundation for incorporating
emotionally expressive robots in SAR applications across therapeutic, educational, and entertainment
settings. Future research will delve further into the collected data to determine whether participants
experienced unconscious emotional changes, advancing the understanding of how emotionally aware
robots might foster behavioral change and enhance emotional connection in HRI.


Acknowledgments
This research is supported by the Italian MUR and EU under the project ADVISOR (ADaptiVe leglble
robotS for trustwORthy health coaching) - PRIN PNRR 2022 PE6 - Cod. P202277EJ2 and under the
complementary actions to the NRRP “Fit4MedRob - Fit for Medical Robotics” Grant (# PNC0000007).


References
 [1] M. J. Matarić, B. Scassellati, Socially assistive robotics, Springer handbook of robotics (2016)
     1973–1994.
 [2] L. D’Arco, H. Zheng, H. Wang, Sensebot: A wearable sensor enabled robotic system to support
     health and well-being, in: 6th Collaborative European Research Conference, 2021, pp. 30–45.
 [3] C. Frith, Role of facial expressions in social interactions, Philosophical Transactions of the Royal
     Society B: Biological Sciences 364 (2009) 3453–3458.
 [4] C.-E. Yu, Emotional contagion in human-robot interaction, E-review of Tourism Research 17
     (2020).
 [5] M. Staffa, S. Rossi, Enhancing affective robotics via human internal state monitoring, in: 31st IEEE
     Intern. Conf. on Robot and Human Interactive Communication (RO-MAN), 2022, pp. 884–890.
 [6] L. Fiorini, F. G. Loizzo, G. D’Onofrio, A. Sorrentino, F. Ciccone, S. Russo, F. Giuliani, D. Sancarlo,
     F. Cavallo, Can i feel you? recognizing human’s emotions during human-robot interaction, in:
     International Conference on Social Robotics, Springer, 2022, pp. 511–521.
 [7] S. Rossi, A. Rossi, S. Sangiovanni, Towards the Evaluation of the Role of Embodiment in Emotions
     Elicitation, in: 2023 11th International Conference on Affective Computing and Intelligent
     Interaction Workshops and Demos (ACIIW), IEEE, 2023, pp. 1–8.
 [8] U. Dimberg, M. Thunberg, K. Elmehed, Unconscious facial reactions to emotional facial expressions,
     Psychological science 11 (2000) 86–89.
 [9] F. Robotics, Furhat robot, www.furhatrobotics.com/, 2024. Accessed: 2024-03-01.
[10] S. Al Moubayed, J. Beskow, G. Skantze, B. Granström, Furhat: A back-projected human-like robot
     head for multiparty human-machine interaction, in: Cognitive Behavioural Systems, Springer
     Berlin Heidelberg, Berlin, Heidelberg, 2012, pp. 114–130.
[11] M. Rescigno, M. Spezialetti, S. Rossi, Personalized models for facial emotion recognition through
     transfer learning, Multimedia Tools and Applications 79 (2020) 35811–35828.
[12] Z. Yu, X. Li, G. Zhao, Facial-video-based physiological signal measurement: Recent advances and
     affective applications, IEEE Signal Processing Magazine 38 (2021) 50–58.
[13] P. Biosignals, Bitalino, www.pluxbiosignals.com/collections/bitalino, 2024. Accessed: 2024-03-01.
[14] J. A. Miranda-Correa, M. K. Abadi, N. Sebe, I. Patras, Amigos: A dataset for affect, personality
     and mood research on individuals and groups, IEEE transactions on affective computing 12 (2018)
     479–493.
[15] J. D. Morris, Observations: Sam: the self-assessment manikin; an efficient cross-cultural measure-
     ment of emotional response, Journal of advertising research 35 (1995) 63–68.
[16] M. K. Abadi, R. Subramanian, S. M. Kia, P. Avesani, I. Patras, N. Sebe, Decaf: Meg-based multimodal
     database for decoding affective physiological responses, IEEE Transactions on Affective Computing
     6 (2015) 209–222.
[17] E. J. Lawrence, P. Shaw, D. Baker, S. Baron-Cohen, A. S. David, Measuring empathy: reliability
     and validity of the empathy quotient, Psychological medicine 34 (2004) 911–920.
[18] A. Bangor, P. Kortum, J. Miller, Determining what individual sus scores mean: Adding an adjective
     rating scale, Journal of usability studies 4 (2009) 114–123.