<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Toward Empathetic Human-Robot Interaction: A Multimodal Framework Integrating Psychological Profiling and Emotion Recognition⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alberto Borboni</string-name>
          <email>alberto.borboni@unibs.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Luca Ragno</string-name>
          <email>luca.ragno@unibs.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fabio Zanoletti</string-name>
          <email>fabio.zanoletti@unibs.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Università degli Studi di Brescia, Department of Mecahnical and Industrial Engineering</institution>
          ,
          <addr-line>Via Branze 38 25123 Brescia</addr-line>
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper introduces a multimodal framework designed to enhance empathetic communication in human-robot interactions. Our approach integrates psychological profiling-leveraging publicly available social network data and the DISC personality framework-with real-time emotion recognition across facial, audio, and textual modalities. Facial expressions are analyzed using convolutional neural networks and MTCNN-based face detection, while audio signals are processed through Mel-Frequency Cepstral Coefficients and support vector machines. Textual inputs are evaluated via sentiment analysis using advanced language models. These individual emotional assessments are fused through a fuzzy aggregation method, emphasizing non-verbal cues to derive a comprehensive and adaptive emotional profile. The resulting digital agent tailors its verbal responses and interaction style to align with the user's personality traits and current emotional state, offering a more natural and supportive communication experience.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Empathetic communication</kwd>
        <kwd>Human-robot interaction</kwd>
        <kwd>Psychological profiling</kwd>
        <kwd>DISC personality framework</kwd>
        <kwd>Multimodal emotion recognition</kwd>
        <kwd>Facial expression analysis</kwd>
        <kwd>Audio emotion recognition</kwd>
        <kwd>Text sentiment analysis</kwd>
        <kwd>Fuzzy aggregation</kwd>
        <kwd>Adaptive dialogue systems1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Empathetic communication is a fundamental aspect of human interaction, fostering social
bonding, trust, and cooperation. In recent years, advancements in artificial intelligence and robotics
have enabled the development of socially interactive robots that can engage in empathetic
communication with humans. Such robots have the potential to revolutionize various domains—
including healthcare, education, and social companionship—by providing emotional support and
enhancing human well-being [
        <xref ref-type="bibr" rid="ref1 ref2">1,2</xref>
        ].
      </p>
      <p>
        In human–robot interaction (HRI), empathy is typically defined as the robot’s ability to
recognize, interpret, and appropriately respond to human emotions through both verbal and
nonverbal cues [
        <xref ref-type="bibr" rid="ref3 ref4">3,4</xref>
        ]. Prior studies have shown that robots employing empathy-related gestures (such
as nodding, gazing, and leaning) can positively influence human–human interactions by improving
interpersonal evaluations and increasing emotional support [
        <xref ref-type="bibr" rid="ref5 ref6">5,6</xref>
        ]. These findings suggest that,
rather than replacing human interaction, robots may act as effective mediators in emotionally
charged conversations [
        <xref ref-type="bibr" rid="ref7 ref8">7,8</xref>
        ].
      </p>
      <p>
        Despite these promising outcomes, significant challenges remain in designing robots that
can truly engage in empathetic communication. One key issue is determining the appropriate
degree of anthropomorphism required for effective empathy expression. While some research
suggests that human-like facial expressions are essential for conveying empathy [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], other studies
indicate that empathetic responses can be perceived through contextual verbal communication
alone [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Moreover, different age groups appear to have varying expectations regarding robotic
empathy—older adults tend to prioritize emotional adaptation, whereas younger individuals
emphasize functional interaction aspects such as gaze and response timing [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Beyond cognitive
and expressive factors, a robot’s movement and actuation are also crucial in creating a comfortable
and engaging interaction for different mechatronic systems [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. The ability to generate subtle
facial expressions and dynamically adjust postural changes, i.e. with micro or smart actuators [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ],
enhances the perception of empathy. Studies have demonstrated that optimized motion profiles
[
        <xref ref-type="bibr" rid="ref14 ref15">14, 15</xref>
        ] and reduced residual vibrations [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] contribute to a more natural and fluid interaction,
ensuring both user comfort and system efficiency. In medical applications, for instance,
engagement through empathetic communication has been shown to improve therapeutic outcomes
by fostering adherence to rehabilitation programs and increasing patient motivation [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ].
Furthermore, recent developments in artificial intelligence and speech synthesis have enhanced
robots’ ability to generate natural empathetic responses [
        <xref ref-type="bibr" rid="ref18 ref19 ref20">18-20</xref>
        ]. Advances in deep learning models
—such as bidirectional LSTMs and HMM-based speech synthesis—have significantly improved the
expressiveness of robotic speech, making it more aligned with human expectations [
        <xref ref-type="bibr" rid="ref21 ref22">21,22</xref>
        ].
Additionally, integrating multimodal emotional prediction systems now allows robots to anticipate
user emotions and adjust their communication styles accordingly [
        <xref ref-type="bibr" rid="ref23 ref24">23,24</xref>
        ]. These improvements
contribute to more effective human–robot interactions by increasing the robot’s capacity to
recognize and respond appropriately to human emotional states [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ]. The application of machine
learning in empathetic dialogue generation has also seen significant progress, with methods that
leverage intention recognition from social network messages to enhance contextual awareness
[
        <xref ref-type="bibr" rid="ref26">26</xref>
        ]. Similarly, studies of discourse relations and speech synthesis have enabled better
comprehension of adversative and emphatic speech patterns, further improving human–robot
dialogue interactions [
        <xref ref-type="bibr" rid="ref27 ref28">27,28</xref>
        ]. Moreover, recent research on emotional enhancement and cognitive
adaptation has been instrumental in developing robots that can personalize their interactions to
better accommodate individual user needs [
        <xref ref-type="bibr" rid="ref29 ref30">29,30</xref>
        ]. This paper aims to propose a preliminary
engineering approach to facilitate empathetic communication in human–robot emotional
interaction. It can enhance human well-being through empathetic verbal responses, and adaptive
behaviors fostering meaningful connections.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Materials and Methods</title>
      <p>This study proposes a method for identifying the human agent interacting with the digital agent
using visual data and verbal inquiries. Thereafter, by examining publicly available information
online, the digital agent constructs a psychological profile of the human agent. The psychological
profile is essential for a more advanced contextualization of communication. The digital agent
evaluates visual, auditory, and textual data to comprehensively ascertain the emotional condition
of the human agent. The emotional state facilitates the refinement of communication context to
enhance empathetic alignment with the human agent. The ultimate stage of contextualization
entails employing psychological methods to actualize this empathy in the human agent, thereby
reinforcing the empathetic connection. The accurate definition of the context enables the digital
agent to completely adjust to the human agent by selecting an appropriate communicative register
and enhancing vocal expression to convey the correct emotions, preserving every nuance of
meaning and formality within the empathetic communication established by the human agent. The
following subsections address the tracking of the time-invariant psychological profile and its
applications, the analysis of emotions through facial, vocal, and textual expressions, and the
identification of the instantaneous time-variant emotional profile.</p>
      <sec id="sec-2-1">
        <title>2.1. Psychological profiling of the human agent</title>
        <p>Traditional methods of personality assessment often require self-reported questionnaires, which
can be time-consuming and subject to biases. Advances in natural language processing (NLP) and
personality analytics have led to automated profiling systems that infer personality traits from
online data. One such tool is Crystal Knows@, a personality prediction engine that utilizes the
DISC framework to categorize individuals based on their publicly available professional
information.</p>
        <p>This study presents a Python-based approach to interfacing with Crystal’s API for automated
psychological profiling. The script operates by collecting a LinkedIn profile URL extracting the
human agent and using it to query the Crystal API, which then returns a structured personality
profile based on the DISC classification. To achieve this, the methodology follows a series of steps.
First, the human agent provides information to obtain a LinkedIn profile URL along with a valid
Crystal API key for authentication. The script then processes an HTTP POST request containing
the URL and sends it to the API using the `requests` library. Upon successful retrieval, the API
returns a JSON response containing the subject’s personality classification, which is then parsed
and displayed in a structured format for easy interpretation. To ensure robustness, the script
includes error handling mechanisms that display appropriate messages in case of failures, such as
invalid API credentials, insufficient data, or server errors.</p>
        <p>Crystal's DISC framework categorizes personality into 16 distinct types, each defined by varying
degrees of Dominance (D), Influence (I), Steadiness (S), and Conscientiousness (C). These types are
represented by specific archetypes that offer insights into individual behaviors and communication
styles. The Dominance (D) category includes the Captain (assertive and ambitious), Driver (decisive
and persuasive), Initiator (charismatic and resourceful), and Influencer (energetic and adventurous).
The Influence (I) category features the Motivator (enthusiastic and outgoing), Encourager (warm
and light-hearted), Harmonizer (patient and accommodating), and Counselor (empathetic and
supportive). The Steadiness (S) category comprises the Supporter (calm and respectful), Planner
(predictable and detail-oriented), Stabilizer (reserved and cautious), and Editor (meticulous and
independent). Lastly, the Conscientiousness (C) category includes the Analyst (methodical and
private), Skeptic (logical and efficient), Questioner (competitive and strategic), and Architect
(strong-willed and purposeful). By understanding these personality types, the digital agent can
enhance self-awareness, improve team dynamics, and refine interpersonal relationships.</p>
        <p>To effectively adapt to DISC profiles, communication should be tailored to each personality
type. When interacting with individuals high in Dominance (D), it is best to be direct, concise, and
focused on results, avoiding unnecessary details that may be seen as time-wasting. For those with
Influence (I) traits, engaging with enthusiasm, recognizing their contributions, and allowing space
for social interaction fosters better communication. Individuals with Steadiness (S) prefer a patient
approach, reassurance, and stability, so sudden changes should be minimized to avoid discomfort.
Finally, those high in Conscientiousness (C) value detailed and accurate information, requiring
time to process and respond thoughtfully.</p>
        <p>The psychological profile of the human agent is associated to a reference composed by name,
surname and face to store it for future use.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Human agent identification</title>
        <p>The proposed face identification system utilizes the `face_recognition` library to detect and
recognize human faces in real-time video streams. Initially, a dataset of known faces is loaded from
a predefined directory, where each image is processed to extract facial encodings using deep
learning-based feature extraction. During execution, frames from a video source are captured and
preprocessed by resizing and converting them to RGB format. The system detects facial landmarks,
extracts encodings, and compares them against the stored database using a distance-based
similarity metric. If a match is found, the system labels the detected face accordingly; otherwise, it
assigns an "Unknown" label and prompts the human agent to provide their name and surname.
This information, along with the extracted facial encoding, is then stored in the database for future
recognition. The results, including bounding boxes and names, are overlaid onto the video stream
and displayed in real time. The system operates continuously, allowing for dynamic face
recognition, and can be terminated by the user via a key press.</p>
        <p>Leveraging emotional insights in conjunction with a subject's psychological profile, such as one
derived from the DISC model, can significantly enhance the effectiveness of your interactions.</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Face emotion recognition</title>
        <p>For more detailed instructions, the proposed system is built upon the Facial Expression Recognition
(FER) framework, leveraging convolutional neural networks (CNNs) for the classification of
emotions from facial images. The implementation follows a structured pipeline comprising image
acquisition, face detection, emotion classification, and real-time visualization. A modular
architecture is adopted, wherein face detection is performed using the Multi-Task Cascaded
Convolutional Networks (MTCNN) detector to ensure robust localization, while the FER model
classifies emotions into seven predefined categories: Angry, Disgust, Fear, Happy, Sad, Surprise,
and Neutral. Real-time processing is facilitated through the OpenCV library, enabling frame
capture via a webcam and subsequent display of annotated results. The system is implemented in
Python, employing OpenCV for video acquisition and the FER library for emotion classification.
The workflow consists of initializing the FER detector with MTCNN, capturing frames, detecting
faces and extracting bounding boxes, applying the FER classifier for emotion prediction, overlaying
the detected emotion onto the video frame, and displaying the annotated frame in real time, with
termination triggered by user input.</p>
      </sec>
      <sec id="sec-2-4">
        <title>2.4. Audio emotion recognition</title>
        <p>The proposed approach consists of three main steps: feature extraction, classification, and
prediction, following a supervised learning paradigm in which a machine learning model is trained
to recognize emotions based on extracted speech features. The initial phase involves feature
extraction from the raw speech signal, utilizing Mel-Frequency Cepstral Coefficients (MFCCs),
which are widely adopted in speech processing due to their effectiveness in capturing the
perceptual characteristics of human hearing. The extraction process comprises audio
preprocessing, where the input audio file is loaded using the Librosa library while maintaining its
original sampling rate, followed by the computation of 40 MFCC coefficients from the signal, and
statistical aggregation through the calculation of mean MFCC values across time frames to
generate a fixed-length feature vector. The extracted features are then used to train a supervised
classifier, specifically a Support Vector Machine (SVM) with a radial basis function (RBF) kernel,
chosen for its robustness in speech emotion recognition tasks. The approach is evaluated using the
Toronto Emotional Speech Set (TESS) dataset, which contains 2800 speech samples spoken by two
actresses and classified using the same emotional scale applied in facial emotion analysis, ensuring
consistency in multimodal emotion recognition.</p>
      </sec>
      <sec id="sec-2-5">
        <title>2.5. Text emotion recognition</title>
        <p>In the proposed system, emotion analysis is conducted by leveraging OpenAI’s ChatGPT-4o API,
which processes textual input to generate a statistical evaluation of emotional content. The
methodology involves sending a structured request containing the target text to the API, which
The vector of emotional states is composed of seven probability values ranging from 0 to 1
associated with the seven emotions: Fear, Angry, Disgust, Sad, Neutral, Happy, (positive) Surprise,
as shown in (2), where the time dependence is omitted for the sake of simplicity. It would be
possible to defuzzify the result by identifying a single most probable emotion, but it was preferred
to maintain the emotion defined in a fuzzy manner. In general, the first four emotions are
considered indicative of negative feedback, while the last three emotions are considered indicative
of positive feedback.
(1)
(2)
then analyzes linguistic features, semantic context, and sentiment polarity to classify the
underlying emotions. The model provides a probabilistic distribution of detected emotions,
allowing for a quantitative assessment rather than a binary classification.</p>
      </sec>
      <sec id="sec-2-6">
        <title>2.6. Global emotional state</title>
        <p>The emotional state of the human agent is evaluated in a fuzzy manner as an overlap of the
assessments obtained from facial, audio, and text emotional analyses with appropriate weights. To
appropriately define these weights, it is assumed that the human agent may attempt to mask their
emotional state; therefore, greater weight is given to the communication that is more difficult to
mask, namely non-verbal communication, then to the tone of voice, and finally to the text chosen
for communication, according to the formula indicated in (1), where e(t), ef(t), ea(t), and et(t) are,
respectively, the vectors of the overall emotional state, the one obtained from facial analysis, the
one obtained from audio analysis, and the one obtained from the neutral language analysis of the
text at time t for the human agent.</p>
      </sec>
      <sec id="sec-2-7">
        <title>2.7. Emphatic prompt modifications due to emotional state</title>
        <p>The communicative form of the digital agent is guided by the psychological profile of the human
agent, but some behavioral changes can be expected to vary with the emotional state of the human
agent.</p>
        <p>
          A first category of strategies consists of acknowledging emotions. In particular, two approaches are
used: verbal validation and reflective listening [
          <xref ref-type="bibr" rid="ref31 ref32">31, 32</xref>
          ]. With verbal validation, when the digital
agent detects signs of stress or frustration in the emotional state of the human agent (negative
emotional state) explicitly acknowledges these feelings. For instance, it might say, “I can see that
this situation is really overwhelming for you,” or “It sounds like you’re feeling quite frustrated
right now.” This not only validates the human agent’s emotional experience but also creates an
opening for them to elaborate if they wish. With reflective language, the digital agent mirrors back
what it observes. For example, it can say “It seems like the recent changes have been really
challenging,” or “I understand that this topic is difficult for you.” This technique shows that the
digital agent is actively listening and empathizing with human agent state of mind.
A second category of strategies consist in modifying tone and pace according to the emotional
state of the human agent [
          <xref ref-type="bibr" rid="ref33 ref34">33, 34</xref>
          ]. If the digital agent notice that human agent is under stress
(negative emotional state), it can slow down the speech to induce a calming effect. A slower
communication helps ensure that the message is clear and gives the human agent time to process
the response. Similarly, if the human agent’s tone is tense or rapid, the digital agent might
consciously lower its voice, and speak in a measured tone, using softer intonations. If the human
agent is expressing distress, a deliberate pause after acknowledging their feelings gives them space
to process their thoughts and may encourage further sharing. This also demonstrates that digital
agent is not rushing the conversation and is tuned into human agent’s emotional needs.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Results</title>
      <sec id="sec-3-1">
        <title>3.1. General decision algorithm</title>
        <p>The flowchart in Figure 1 illustrates the proposed automated system for personalized human-robot
interaction based on facial recognition, emotional analysis, and psychological profiling. The
process begins with face detection and recognition; if the person is unidentified, their name and
surname are collected, stored in a database, and their LinkedIn profile is retrieved. If a LinkedIn
profile exists, the system uses the Crystal API to generate a psychological profile, which is also
stored. Subsequently, face and audio inputs are analyzed for emotional content using multiple
modalities: acoustic features, textual context, and facial expressions. A fuzzy aggregation method
integrates these emotional cues with the psychological profile to determine the user's state. This
state is then used to generate a context-aware response by querying ChatGPT, which is converted
into a vocal message using a text-to-speech module and delivered to the user. The flowchart
outlines a structured pipeline for adaptive and emotionally intelligent human-computer dialogue.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Pseudocodes of principal modules</title>
        <p>Figure 2 presents the pseudocode for face recognition. Figure 3 presents the pseudocode for
psychological profile extraction. Figure 4 presents the pseudocode for face emotion recognition.
Figure 5 presents the pseudocode for audio emotion recognition. Figure 6 presents the pseudocode
for text emotion recognition.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Discussion</title>
      <p>
        Based on the proposed methodology, a digital agent capable of empathetic interaction in the
verbal domain with a human agent has been developed. To achieve this result, the human agent is
analyzed and decomposed into a time-invariant component during the single interaction, the
psychological profile, and a time-variant component during the single interaction, the emotional
state. There are works in the literature on profiling through social networks [
        <xref ref-type="bibr" rid="ref35">35</xref>
        ]; the present work
is based on Crystal Knows@ profiling techniques which rely on the LinkedIn database.
A primary issue may be related to the fact that an individual may not have a LinkedIn profile or
there may be multiple profiles associated with the same name and surname. In the first case, the
digital agent behaves neutrally, whereas in the second case, it poses additional questions to
differentiate between the various profiles.
      </p>
      <p>
        A second critical issue is related to the fact that individuals present an altered image of
themselves on social networks. It would be appropriate to implement a profile-building method
based solely on direct interaction; however, this approach might also prove ineffective if the
human subject exhibits different psychological aspects when interacting with a digital agent
compared to another human agent. On the other hand, there are various other psychological
profiling software through social networks, including, for example, IBM Watson Personality
Insights, Apply Magic Sauce API, Portrait, or Humantic AI and Kosinski et al [
        <xref ref-type="bibr" rid="ref36">36</xref>
        ] demonstrated
that accessible digital records of behavior like social networks can be used to accurately predict
personal traits in the same way of standard psychological tests with a statistical evidence.
      </p>
      <p>
        There are several works in the literature on multimodal emotion analysis, both hierarchical [
        <xref ref-type="bibr" rid="ref37">37</xref>
        ]
and non-hierarchical [
        <xref ref-type="bibr" rid="ref38">38</xref>
        ]; the present work also uses various libraries and commercial systems,
integrating them hierarchically. An example of an advanced application of this type is Google@'s
PaliGemma Mix; which presents an excellent level of maturity compared to the present work.
Instead, no work was found in the literature that simultaneously uses both psychological profiling
and emotion analysis to promote empathetic interaction.
      </p>
      <p>As indicated in Figure 1, an empathetic Text-to-Speech (TTS) was used to communicate
emotions through an appropriate tone and timbre of voice, speed, cadence, and rhythm. In
particular, the Hume.AI service was used, as it is one of the most advanced empathetic TTS
systems currently available. This technology has reached a high level of maturity.</p>
      <p>This study has several limitations that should be acknowledged. First, the use of LinkedIn and
the Crystal API for creating psychological profiles may introduce inaccuracies, as the information
provided in these profiles can be selective or distorted. Users often curate their professional profiles
to present a particular image, which may not fully reflect their actual personality traits or
behavioral tendencies. Future work should consider integrating additional data sources or
validation methods to improve the reliability of personality assessments.</p>
      <p>Another limitation is the emphasis placed on non-verbal signals in emotional expression. While
non-verbal cues play a significant role in human communication, their interpretation varies across
cultures and individuals. This variability may lead to inconsistencies in how emotions are
perceived and responded to by the system. Further studies should investigate the influence of
cultural and personal differences on the effectiveness of non-verbal cues in human-robot
interaction. Then the response is slow due to the hardware used and the type of cloud-based
implementation: it might be interesting to develop an edge computing application to reduce
latency.</p>
      <p>Finally, although the system is designed to adapt the tone and pace of speech according to the
user's emotional state, the impact of these modifications on interaction quality remains unclear.
The study does not provide an evaluation of whether these adjustments enhance engagement or
lead to unnatural behavior in the robot. Future research should assess user perceptions and the
overall effectiveness of vocal adaptation in improving interaction dynamics.</p>
      <p>The effectiveness of the approach in real cases was not explored in this preliminary work,
which should be addressed in future developments.</p>
      <p>It is also important to note that psychological profiling, although widely used and applicable in
various professional and non-professional contexts, presents delicate aspects related to privacy and
the decisions of both the subject interacting with the digital agent and those experienced by the
subject themselves. To avoid improper behavior by the digital agent, it is possible to refer to
various ethical guidelines or advanced regulations valid in certain geographical areas, such as the
European General Data Protection Regulation (GDPR) generally for all subjects, but especially for
cognitively weaker subjects.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions</title>
      <p>In this work, we presented a novel multimodal framework for enhancing empathetic
communication in human–robot interaction. By integrating psychological profiling derived from
social network data with real-time emotion recognition from facial, audio, and textual inputs, our
system offers a comprehensive understanding of a user's emotional state. This integration allows
the digital agent to adapt its communication style dynamically, delivering responses that are not
only context-aware but also emotionally resonant. Our architecture leverages advanced face
recognition techniques, robust audio and text analysis, and a fuzzy aggregation method to
synthesize these diverse modalities. The resulting framework has shown promising potential in
improving interaction quality across various applications—from healthcare and education to social
companionship. By prioritizing non-verbal cues, which are often more difficult to mask, the system
effectively tailors its behavior to meet the emotional needs of users. Despite these advances, several
challenges remain. Issues such as processing latency, the reliability of social network-based
profiling, and the optimization of multimodal fusion require further exploration. Future work will
focus on refining these components, incorporating more sophisticated machine learning
techniques, and conducting comprehensive user studies to validate the system's effectiveness in
real-world scenarios. In conclusion, our framework lays the groundwork for next-generation
human–robot interactions that are not only functionally efficient but also emotionally intelligent,
paving the way for more natural, supportive, and adaptive digital communication interfaces.</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <p>The author(s) have not employed any Generative AI tools.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>M. De Gennaro</surname>
          </string-name>
          , E. Krumhuber, G. Lucas,
          <source>Effectiveness of an Empathic Chatbot in Combating Adverse Effects of Social Exclusion on Mood. Frontiers in Psychology 10</source>
          ,
          <year>2020</year>
          . doi:
          <volume>10</volume>
          .3389/fpsyg.
          <year>2019</year>
          .
          <volume>03061</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Fu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Gao</surname>
          </string-name>
          <article-title>, Multi-Modal Hierarchical Empathetic Framework for Social Robots With Affective Body Control</article-title>
          .
          <source>IEEE Transactions on Affective Computing</source>
          <volume>15</volume>
          ,
          <year>2024</year>
          , pp.
          <fpage>1621</fpage>
          -
          <lpage>1633</lpage>
          . doi:
          <volume>10</volume>
          .1109/TAFFC.
          <year>2024</year>
          .
          <volume>3356511</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Park</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Whang</surname>
          </string-name>
          ,
          <article-title>Empathy in Human-Robot Interaction: Designing for Social Robots</article-title>
          .
          <source>International Journal of Environmental Research and Public Health</source>
          <volume>19</volume>
          ,
          <year>2022</year>
          . doi:
          <volume>10</volume>
          .3390/ijerph19031889.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>R. De Kervenoael</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Hasan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Schwob</surname>
          </string-name>
          , E. Goh,
          <article-title>Leveraging human-robot interaction in hospitality services: Incorporating the role of perceived value, empathy, and information sharing into visitors' intentions to use social robots</article-title>
          .
          <source>Tourism Management</source>
          <volume>78</volume>
          ,
          <year>2020</year>
          . doi:
          <volume>10</volume>
          .1016/j.tourman.
          <year>2019</year>
          .
          <volume>104042</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>N.</given-names>
            <surname>Tuyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Elibol</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Chong</surname>
          </string-name>
          ,
          <article-title>Learning Bodily Expression of Emotion for Social Robots Through Human Interaction</article-title>
          .
          <source>IEEE Transactions on Cognitive and Developmental Systems</source>
          ,
          <volume>13</volume>
          ,
          <year>2020</year>
          , pp.
          <fpage>16</fpage>
          -
          <lpage>30</lpage>
          . doi:
          <volume>10</volume>
          .1109/TCDS.
          <year>2020</year>
          .
          <volume>3005907</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Human-robot interaction based on gesture and movement recognition</article-title>
          .
          <source>Signal Process. Image Commun</source>
          .
          <volume>81</volume>
          ,
          <year>2020</year>
          . doi:
          <volume>10</volume>
          .1016/j.image.
          <year>2019</year>
          .
          <volume>115686</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Noguchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Kamide</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Tanaka</surname>
          </string-name>
          ,
          <article-title>Weight Shift Movements of a Social Mediator Robot Make It Being Recognized as Serious and Suppress Anger, Revenge and Avoidance Motivation of the User</article-title>
          .
          <source>Frontiers in Robotics and AI 9</source>
          ,
          <year>2022</year>
          . doi:
          <volume>10</volume>
          .3389/frobt.
          <year>2022</year>
          .
          <volume>790209</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>C.</given-names>
            <surname>Fu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Deng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Mahzoon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Ishiguro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A Preliminary</given-names>
            <surname>Study on Realizing HumanRobot Mental Comforting Dialogue via Sharing Experience Emotionally. Sensors</surname>
          </string-name>
          (Basel, Switzerland)
          <volume>22</volume>
          ,
          <year>2022</year>
          . doi:
          <volume>10</volume>
          .3390/s22030991.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yang</surname>
          </string-name>
          , H. Liu,
          <article-title>Toward Children's Empathy Ability Analysis: Joint Facial Expression Recognition and Intensity Estimation Using Label Distribution Learning</article-title>
          .
          <source>IEEE Transactions on Industrial Informatics 18</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>16</fpage>
          -
          <lpage>25</lpage>
          . doi: 
          <volume>10</volume>
          .1109/TII.
          <year>2021</year>
          .
          <volume>3075989</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>C.</given-names>
            <surname>Regenbogen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Schneider</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Gur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Schneider</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            <surname>Habel</surname>
          </string-name>
          , T. Kellermann,
          <article-title>Multimodal human communication - Targeting facial expressions, speech content and prosody</article-title>
          .
          <source>NeuroImage 60</source>
          ,
          <year>2012</year>
          , pp.
          <fpage>2346</fpage>
          -
          <lpage>2356</lpage>
          . doi:
          <volume>10</volume>
          .1016/j.neuroimage.
          <year>2012</year>
          .
          <volume>02</volume>
          .043.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>X.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tong</surname>
          </string-name>
          ,
          <article-title>Effects of Robot Animacy and Emotional Expressions on Perspective-Taking Abilities: A Comparative Study across Age Groups</article-title>
          .
          <source>Behavioral Sciences 13</source>
          ,
          <year>2023</year>
          . doi:
          <volume>10</volume>
          .3390/bs13090728.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>F.</given-names>
            <surname>Aggogeri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Borboni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Faglia</surname>
          </string-name>
          ,
          <source>Reliability Roadmap for Mechatronic Systems. Applied Mechanics and Materials</source>
          , volume
          <volume>373</volume>
          -
          <fpage>375</fpage>
          ,
          <year>2022</year>
          , pp.
          <fpage>130</fpage>
          -
          <lpage>133</lpage>
          . doi:
          <volume>10</volume>
          .4028/www.scientific.net/amm.373-
          <fpage>375</fpage>
          .
          <fpage>130</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>A.</given-names>
            <surname>Borboni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Aggogeri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Pellegrini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Faglia</surname>
          </string-name>
          ,
          <article-title>Innovative Modular SMA Actuator</article-title>
          .
          <source>In Advanced Materials Research</source>
          , volume
          <volume>590</volume>
          ,
          <string-name>
            <surname>Trans Tech Publications Ltd</surname>
          </string-name>
          ,
          <year>2012</year>
          , pp.
          <fpage>405</fpage>
          -
          <lpage>410</lpage>
          . doi:
          <volume>10</volume>
          .4028/www.scientific.net/amr.590.405.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>A.</given-names>
            <surname>Borboni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Faglia</surname>
          </string-name>
          ,
          <article-title>Stochastic evaluation and analysis of free vibrations in simply supported piezoelectric bimorphs</article-title>
          .
          <source>Journal of Applied Mechanics</source>
          ,
          <string-name>
            <surname>Transactions</surname>
            <given-names>ASME</given-names>
          </string-name>
          , volume
          <volume>80</volume>
          ,
          <article-title>issue 2, art</article-title>
          . no.
          <issue>021003</issue>
          ,
          <year>2013</year>
          . doi:
          <volume>10</volume>
          .1115/1.4007721.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>F.</given-names>
            <surname>Aggogeri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Borboni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Faglia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Merlo</surname>
          </string-name>
          , S. de Cristofaro,
          <source>Precision Positioning Systems: An Overview of the State of Art. Applied Mechanics and Materials</source>
          , volume
          <volume>336</volume>
          -
          <fpage>338</fpage>
          ,
          <year>2013</year>
          pp.
          <fpage>1170</fpage>
          -
          <lpage>1173</lpage>
          . doi:
          <volume>10</volume>
          .4028/www.scientific.net/amm.336-
          <fpage>338</fpage>
          .
          <fpage>1170</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>A.</given-names>
            <surname>Borboni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lancini</surname>
          </string-name>
          ,
          <article-title>Commanded motion optimization to reduce residual vibration</article-title>
          .
          <source>Journal of Vibration and Acoustics</source>
          , volume
          <volume>137</volume>
          , issue 3, article no.
          <source>A1</source>
          ,
          <year>2015</year>
          . doi:
          <volume>10</volume>
          .1115/1.4029575.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>C.</given-names>
            <surname>Sconza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Negrini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. Di</given-names>
            <surname>Matteo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Borboni</surname>
          </string-name>
          , G. Boccia,
          <string-name>
            <given-names>I.</given-names>
            <surname>Petrikonis</surname>
          </string-name>
          , E. Stankevičius,
          <string-name>
            <given-names>R.</given-names>
            <surname>Casale</surname>
          </string-name>
          ,
          <article-title>Robot-Assisted Gait Training in Patients with Multiple Sclerosis: A Randomized Controlled Crossover Trial</article-title>
          . Medicina, volume
          <volume>57</volume>
          , issue 7, article no.
          <issue>713</issue>
          ,
          <year>2021</year>
          . doi:
          <volume>10</volume>
          .3390/medicina57070713.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>J.</given-names>
            <surname>James</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Balamurali</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Watson</surname>
          </string-name>
          ,
          <string-name>
            <surname>B.</surname>
          </string-name>
          , MacDonald, Empathetic Speech Synthesis and
          <article-title>Testing for Healthcare Robots</article-title>
          .
          <source>International Journal of Social Robotics</source>
          <volume>13</volume>
          ,
          <year>2020</year>
          pp.
          <fpage>2119</fpage>
          -
          <lpage>2137</lpage>
          . doi:
          <volume>10</volume>
          .1007/s12369-020-00691-4.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>F.</given-names>
            <surname>Efthymiou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Hildebrand</surname>
          </string-name>
          ,
          <article-title>Empathy by Design: The Influence of Trembling AI Voices on Prosocial Behavior</article-title>
          .
          <source>IEEE Transactions on Affective Computing</source>
          , volume
          <volume>15</volume>
          ,
          <year>2024</year>
          , pp.
          <fpage>1253</fpage>
          -
          <lpage>1263</lpage>
          . doi:
          <volume>10</volume>
          .1109/TAFFC.
          <year>2023</year>
          .
          <volume>3332742</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Hou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Botteldooren</surname>
          </string-name>
          , T. Belpaeme, No More Mumbles:
          <article-title>Enhancing Robot Intelligibility Through Speech Adaptation</article-title>
          .
          <source>IEEE Robotics and Automation Letters</source>
          ,
          <volume>9</volume>
          ,
          <year>2024</year>
          , pp.
          <fpage>6162</fpage>
          -
          <lpage>6169</lpage>
          . doi:
          <volume>10</volume>
          .1109/LRA.
          <year>2024</year>
          .
          <volume>3401117</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>S.</given-names>
            <surname>Rathor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Agrawal</surname>
          </string-name>
          ,
          <article-title>A robust model for domain recognition of acoustic communication using Bidirectional LSTM and deep neural network</article-title>
          .
          <source>Neural Computing and Applications</source>
          ,
          <volume>33</volume>
          ,
          <year>2021</year>
          , pp.
          <fpage>11223</fpage>
          -
          <lpage>11232</lpage>
          . doi:
          <volume>10</volume>
          .1007/s00521-020-05569-0.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Ling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. A.</given-names>
            <surname>Zen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Schuster</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Qian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Meng</surname>
          </string-name>
          , L. Deng,
          <article-title>Deep Learning for Acoustic Modeling in Parametric Speech Generation: A systematic review of existing techniques and future trends</article-title>
          .
          <source>IEEE Signal Processing Magazine</source>
          ,
          <volume>32</volume>
          ,
          <year>2012</year>
          , pp
          <fpage>35</fpage>
          -
          <lpage>52</lpage>
          . doi:
          <volume>10</volume>
          .1109/MSP.
          <year>2014</year>
          .
          <volume>2359987</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Lunscher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tsuboi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , S. Alves,
          <string-name>
            <given-names>G.</given-names>
            <surname>Nejat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Benhabib</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A Multimodal</given-names>
            <surname>Emotional</surname>
          </string-name>
          <article-title>Human-Robot Interaction Architecture for Social Robots Engaged in Bidirectional Communication</article-title>
          .
          <source>IEEE Transactions on Cybernetics</source>
          ,
          <volume>51</volume>
          ,
          <year>2020</year>
          , pp.
          <fpage>5954</fpage>
          -
          <lpage>5968</lpage>
          . doi:
          <volume>10</volume>
          .1109/TCYB.
          <year>2020</year>
          .
          <volume>2974688</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Pan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Cao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <article-title>A multimodal emotional communication based humans-robots interaction system</article-title>
          .
          <source>2016 35th Chinese Control Conference (CCC)</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>6363</fpage>
          -
          <lpage>6368</lpage>
          . doi:
          <volume>10</volume>
          .1109/CHICC.
          <year>2016</year>
          .
          <volume>7554357</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>T.</given-names>
            <surname>Applewhite</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Zhong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Dornberger</surname>
          </string-name>
          ,
          <article-title>Novel Bidirectional Multimodal System for Affective Human-Robot Engagement</article-title>
          .
          <source>2021 IEEE Symposium Series on Computational Intelligence (SSCI)</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>7</lpage>
          . doi:
          <volume>10</volume>
          .1109/SSCI50451.
          <year>2021</year>
          .
          <volume>9659935</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>G.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Firdaus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ekbal</surname>
          </string-name>
          , P. Bhattacharyya,
          <string-name>
            <surname>EmoInt-Trans</surname>
          </string-name>
          :
          <article-title>A Multimodal Transformer for Identifying Emotions and Intents in Social Conversations</article-title>
          .
          <source>IEEE/ACM Transactions on Audio, Speech, and Language Processing</source>
          ,
          <volume>31</volume>
          ,
          <year>2023</year>
          , pp.
          <fpage>290</fpage>
          -
          <lpage>300</lpage>
          . doi:
          <volume>10</volume>
          .1109/TASLP.
          <year>2022</year>
          .
          <volume>3224287</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>J.</given-names>
            <surname>Crumpton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bethel</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>A Survey of Using Vocal Prosody to Convey Emotion in Robot Speech</article-title>
          .
          <source>International Journal of Social Robotics</source>
          ,
          <volume>8</volume>
          ,
          <year>2015</year>
          , pp.
          <fpage>271</fpage>
          -
          <lpage>285</lpage>
          . doi:
          <volume>10</volume>
          .1007/s12369-015- 0329-4.
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>C.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chrysostomou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <article-title>ToD4IR: A Humanised Task-Oriented Dialogue System for Industrial Robots</article-title>
          .
          <source>IEEE Access</source>
          ,
          <volume>10</volume>
          ,
          <year>2022</year>
          , pp.
          <fpage>91631</fpage>
          -
          <lpage>91649</lpage>
          . doi:
          <volume>10</volume>
          .1109/ACCESS.
          <year>2022</year>
          .
          <volume>3202554</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>S.</given-names>
            <surname>Shenoy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Lynch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Manuel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Doryab</surname>
          </string-name>
          ,
          <article-title>A Self Learning System for Emotion Awareness</article-title>
          and Adaptation in Humanoid Robots.
          <source>2022 31st IEEE International Conference on Robot and Human Interactive Communication (RO-MAN)</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>912</fpage>
          -
          <lpage>919</lpage>
          . doi:
          <volume>10</volume>
          .1109/ROMAN53752.
          <year>2022</year>
          .
          <volume>9900581</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>O.</given-names>
            <surname>Nocentini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Fiorini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Acerbi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sorrentino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Mancioppi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Cavallo</surname>
          </string-name>
          ,
          <article-title>A Survey of Behavioral Models for Social Robots</article-title>
          . Robotics, volume
          <volume>8</volume>
          , article no.
          <issue>54</issue>
          ,
          <year>2019</year>
          . doi:
          <volume>10</volume>
          .3390/ROBOTICS8030054.
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>E.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <article-title>Comparative effects of empathic verbal responses: reflection versus validation</article-title>
          .
          <source>Journal of counseling psychology</source>
          , volume
          <volume>60</volume>
          , issue 3,
          <year>2013</year>
          , pp.
          <fpage>439</fpage>
          -
          <lpage>44</lpage>
          . doi:
          <volume>10</volume>
          .1037/a0032786.
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>E.</given-names>
            <surname>Lavee</surname>
          </string-name>
          , G. Itzchakov,
          <article-title>Good listening: A key element in establishing quality in qualitative research</article-title>
          .
          <source>Qualitative Research</source>
          , volume
          <volume>23</volume>
          , issue 3,
          <year>2023</year>
          , pp.
          <fpage>614</fpage>
          -
          <lpage>631</lpage>
          . doi:
          <volume>10</volume>
          .1177/14687941211039402.
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>E.</given-names>
            <surname>Rodero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Cores-Sarría</surname>
          </string-name>
          ,
          <article-title>Best Prosody for News: A Psychophysiological Study Comparing a Broadcast to a Narrative Speaking Style</article-title>
          .
          <source>Communication Research</source>
          ,
          <volume>50</volume>
          (
          <issue>3</issue>
          ),
          <year>2023</year>
          , pp.
          <fpage>361</fpage>
          -
          <lpage>384</lpage>
          . doi:
          <volume>10</volume>
          .1177/00936502211059360.
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>J.</given-names>
            <surname>Rodd</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Bosker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ernestus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Alday</surname>
          </string-name>
          , A. Meyer, T. Bosch,
          <article-title>Control of speaking rate is achieved by switching between qualitatively distinct cognitive "gaits": Evidence from simulation</article-title>
          .
          <source>Psychological review</source>
          ,
          <year>2019</year>
          . doi:
          <volume>10</volume>
          .1037/rev0000172.
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>E.</given-names>
            <surname>Ishukova</surname>
          </string-name>
          ; V.
          <article-title>Salmanov; A. Kalyabin; A. Antonenko, Approaches to Construct a Psychological Portrait of Users Based on Analysis of Data in Open Profiles of Social Networks:</article-title>
          <source>Proceedings - 2019 1st International Conference on Control Systems, Mathematical Modelling, Automation and Energy Efficiency</source>
          ,
          <string-name>
            <surname>SUMMA</surname>
          </string-name>
          <year>2019</year>
          ,
          <year>2019</year>
          , pp.
          <fpage>537</fpage>
          -
          <lpage>539</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>M.</given-names>
            <surname>Kosinski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Stillwell</surname>
          </string-name>
          , T. Graepel,
          <article-title>Private traits and attributes are predictable from digital records of human behavior</article-title>
          .
          <source>Proceedings of the National Academy of Sciences of the United States of America</source>
          , volume
          <volume>110</volume>
          , issue 15,
          <year>2013</year>
          , pp.
          <fpage>5802</fpage>
          -
          <lpage>5805</lpage>
          . doi:
          <volume>10</volume>
          .1073/pnas.1218772110
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [37]
          <string-name>
            <given-names>C.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Pan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>A twin disentanglement Transformer Network with Hierarchical-Level Feature Reconstruction for robust multimodal emotion recognition</article-title>
          .
          <source>Expert Systems with Applications</source>
          , volume
          <volume>264</volume>
          , article no.
          <issue>125822</issue>
          ,
          <year>2025</year>
          . doi:
          <volume>10</volume>
          .1016/j.eswa.
          <year>2024</year>
          .
          <volume>125822</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          [38]
          <string-name>
            <given-names>E.</given-names>
            <surname>Boitel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mohasseb</surname>
          </string-name>
          , E. Haig, MIST:
          <article-title>Multimodal emotion recognition using DeBERTa for text, Semi-CNN for speech, ResNet-50 for facial, and 3D-CNN for motion analysis</article-title>
          .
          <source>Expert Systems with Applications</source>
          ,
          <volume>270</volume>
          , art. no.
          <issue>126236</issue>
          ,
          <year>2025</year>
          . doi:
          <volume>10</volume>
          .1016/j.eswa.
          <year>2024</year>
          .
          <volume>126236</volume>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>