<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Modulation via Reinforcement Learning and Prompted Language Generation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Christian Tamantini</string-name>
          <email>christian.tamantini@cnr.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gloria Beraldo</string-name>
          <email>gloria.beraldo@cnr.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alessandro Umbrico</string-name>
          <email>alessandro.umbrico@cnr.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrea Orlandini</string-name>
          <email>andrea.orlandini@cnr.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Workshop</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Human-Robot Interaction, Reinforcement Learning, Large Language Models, Prompt Engineering</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute of Cognitive Sciences and Technologies, National Research Council of Italy</institution>
          ,
          <addr-line>00196 Rome</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Workshop on Social Robotics for Human-Centered Assistive and Rehabilitation AI (a Fit4MedRob event) - ICSR 2025</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>In the context of socially assistive robotics, there is a growing need for interaction strategies that can adapt to users' emotional states in real time, as fixed or generic communication styles often fail to sustain user engagement or meet individual motivational needs, especially in long-term human-robot interaction. To address this challenge, this paper presents a novel framework for adaptive interaction style modulation in socially assistive agents, combining large language models (LLMs) with reinforcement learning based on real-time emotion recognition. The proposed architecture leverages multimodal sensing to monitor the user's afective state and dynamically selects among predefined communicative styles using Thompson Sampling. At each interaction turn, the user's emotional feedback is converted into a scalar reward, allowing the system to reinforce styles that yield more positive afective outcomes. Style conditioning is operationalized through prompting strategies that guide the LLM to generate responses aligned with the selected tone. A preliminary evaluation using VADER sentiment analysis demonstrates that stylistic prompts successfully induce measurable diferences in sentiment polarity, neutrality, and verbosity. These findings suggest the viability of our approach to style-aware dialogue generation and support its potential for long-term adaptation in personalized human-agent interaction.</p>
      </abstract>
      <kwd-group>
        <kwd>Generation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        In human-robot interaction, the quality of the communicative exchange is a crucial determinant of user
engagement, trust, and adherence over time [
        <xref ref-type="bibr" rid="ref1 ref2 ref3">1, 2, 3</xref>
        ]. While delivering the correct content is necessary,
growing evidence suggests that how an artificial agent communicates, i.e., its interaction style, can
significantly influence the user’s experience and willingness to continue interacting [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ].
      </p>
      <p>
        Interaction style refers to the expressive modality through which a system delivers its output,
encompassing both verbal and non-verbal aspects. This includes linguistic tone, prosody, afective cues,
as well as physical parameters such as movement expressiveness or compliance in embodied agents
that convey physical interaction [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. While diferent styles may convey the same task content, they can
have divergent emotional impacts. A communication style that fails to align with the user’s preferences
or emotional state may result in discomfort, reduced trust, or even disengagement [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>
        Consider, for example, a virtual assistant delivering motivational or instructional feedback. The
same message may be conveyed in a calm, neutral manner or with greater emotional warmth and
enthusiasm. Although the semantic content is preserved, the user may respond diferently depending
on the emotional framing. In long-term interactions, maintaining engagement and emotional resonance
is essential, particularly when the agent operates in support-oriented roles [
        <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
        ].
      </p>
      <p>Therefore, to endow artificial agents with the capability of autonomously learning and implementing
diferent communication styles, this work introduces a modular architecture integrating Reinforcement
Learning (RL) module with a Large Language Model (LLM)-based utterances generation pipeline,
enabling the agent to adjust its verbal behavior based on the user’s estimated afective state.</p>
      <p>CEUR</p>
      <p>ceur-ws.org</p>
      <p>In addition to the architectural contribution, this paper presents a preliminary evaluation aimed at
validating the generative capabilities of the dialogue module. Specifically, a set of stylistically constrained
prompts was used to generate responses across diferent interaction styles, and the resulting utterances
were analyzed using sentiment analysis metrics.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Proposed Framework</title>
      <sec id="sec-2-1">
        <title>2.1. Speech-To-Text</title>
        <p>
          The Speech-To-Text module is responsible for transcribing the user’s spoken input into written text
in real time [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. This transcription serves as the primary input for the linguistic understanding and
response generation processes. Given the importance of accurately interpreting user utterances in
emotionally sensitive and personalized interactions, the module must ensure both lexical accuracy and
robustness to variations in speech patterns, accents, and background noise.
        </p>
        <p>The transcribed utterance is passed to the Dialogue Generation module, where it is used to inform the
response generation. By enabling seamless capture of user input, the Speech-To-Text component plays a
critical role in grounding the interaction in natural, intuitive communication.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Emotion Recognition</title>
        <p>The Emotion Recognition module estimates the user’s afective state in real time, enabling the system to
adapt its interaction strategy based on the perceived emotional response. Accurate emotion recognition
is essential to support personalized and empathetic interaction, particularly in behavior change scenarios
where emotional engagement is closely tied to adherence and motivation.</p>
        <p>
          Diferent sensing modalities can be employed to infer the user’s emotional state, each with distinct
advantages and limitations. First of all, facial expression recognition is one of the most common
techniques in afective computing, leveraging computer vision models to classify discrete emotions
or compute continuous afective dimensions such as valence and arousal [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. While it is efective in
controlled settings, this method is sensitive to occlusions, head pose variations, and lighting conditions
[
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. Its reliability assumes that the user is positioned frontally and remains visually accessible to the
camera, which may not always hold in naturalistic environments.
        </p>
        <p>
          Physiological sensing ofers an alternative that bypasses visual constraints by analyzing biosignals
such as heart rate variability, skin conductance, or respiration patterns [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. These signals provide
rich information about autonomic nervous system activity, allowing for continuous estimation [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ].
However, this approach typically requires the user to wear dedicated sensors (e.g., smartwatches, chest
straps), which may reduce practicality and user acceptance in long-term scenarios.
        </p>
        <p>
          Lastly, posture-based emotion recognition represents a more recent direction, exploiting skeletal
tracking or full-body pose estimation from RGB or depth cameras [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. These methods enable contactless
afect sensing based on body configuration and movement dynamics, without requiring frontal face
visibility. Posture-based emotion recognition is particularly suited to scenarios in which the user may
not be facing the camera but remains physically expressive through gestures or body orientation.
        </p>
        <p>Each of these modalities can be used independently or in combination to enhance the robustness of
emotion recognition. In this framework, the emotional signal, regardless of how it is acquired, is mapped
to a scalar reward that reflects the afective quality of the interaction and informs the reinforcement
learning process.</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Empathic Style Learning</title>
        <p>The Empathic Style Learning module governs the agent’s ability to personalize its communicative
behavior in real-time. Its goal is to select, at each conversational step, the most appropriate interaction
style to promote a positive and engaging user experience. By incorporating implicit emotional cues
from the user, the system continually refines its strategy to sustain emotional resonance and foster
long-term involvement.</p>
        <p>This adaptive mechanism is built upon a reinforcement learning framework that operates at the
level of style modulation. The module receives as input the user’s detected emotional responses from
the current interaction window and computes a reward signal reflecting the afective impact of the
last interaction style used. Based on this feedback, the system updates its internal belief about the
efectiveness of each style and probabilistically selects the style to be used in the next interaction cycle.</p>
        <sec id="sec-2-3-1">
          <title>2.3.1. Reward Computation</title>
          <p>The first component of the Empathic Style Learning module is the Reward Computation. It is therefore
necessary to score the quality of the interaction at each iteration to assign a score to the interaction
style implemented at the previous turn.</p>
          <p>
            Given a multimodal monitoring of the user, the Valence of the detected emotion ( () ) should be
taken into account, reflecting the positive or negative emotional charge of the user. These scores can be
derived from prior literature and quantify how each emotion contributes to the perceived quality of the
interaction [
            <xref ref-type="bibr" rid="ref16">16</xref>
            ].
          </p>
          <p>To capture the afective outcome of a full interaction turn, the system computes the mean valence
score across all detected expressions during that window. Formally, the reward at time  is defined as:
  =</p>
          <p>=1
1 ∑  (  )
where  is the total number of time instants processed and   denotes the emotion detected at frame
 . This average reward provides a scalar measure of the user’s overall afective state in response to
the most recent style employed by the agent. By focusing on continuous, real-time feedback rather
than one-time evaluations, the system is capable of tracking afective trends and adjusting its behavior
accordingly.</p>
        </sec>
        <sec id="sec-2-3-2">
          <title>2.3.2. Style Selection</title>
          <p>To dynamically adapt its style of interaction, the system employs Thompson Sampling, a Bayesian
reinforcement learning algorithm designed for eficient exploration and exploitation in uncertain
environments [17]. Each interaction style is treated as an independent arm in a bandit formulation,
where the agent maintains a Beta distribution over the probability that each style yields a positive
change in user afect.</p>
          <p>At each step, the algorithm samples from these distributions and selects the style with the highest
sampled value. After executing the selected style, the resulting emotional response is quantified via the
reward signal, and the success or failure of the action is evaluated by computing the reward diference:
Δ =   −  −1</p>
          <p>If this diference is positive or zero, the action is interpreted as beneficial, and the success count
that style is incremented. Otherwise, the failure count  is increased. This formulation ensures that the
system rewards not just positive emotional valences but also improvements relative to prior interaction
states, encouraging strategies that maintain or enhance afective engagement over time.</p>
          <p>The probabilistic nature of Thompson Sampling enables the agent to remain responsive to changing
user preferences, avoid premature convergence, and maintain suficient exploration to adapt to evolving
interaction dynamics—features especially desirable in long-term interaction settings [18].</p>
          <p>A graphical representation of the functioning of the Thompson Sampling algorithm implemented in
the Style Selection module is reported in Figure 2.</p>
        </sec>
      </sec>
      <sec id="sec-2-4">
        <title>2.4. Dialogue Generation</title>
        <p>The Dialogue Generation module governs the verbal interaction between the user and the system,
translating incoming user input into semantically coherent and stylistically appropriate system
responses. Unlike systems that rely on pre-scripted interaction flows, our approach is designed to respond
dynamically to user utterances, enabling open-ended yet style-aware dialogue generation.
(1)
(2)
 for</p>
        <p>At each interaction turn, the Dialogue Generation module receives two inputs: (i) the latest user
utterance, transcribed via the Speech-to-Text module, and (ii) the current interaction style selected by
the Empathic Style Learning module. These inputs are used to compose a structured prompt that guides
a large language model in producing a contextually appropriate and stylistically aligned response.</p>
        <p>In this framework, we operationalize two communicative styles as a representative case study:
• Neutral, characterized by direct, factual, and emotionally neutral phrasing, suited for users who
prefer eficiency and minimal afective stimulation;
• Enthusiastic, marked by positively expressive, motivational language, aimed at encouraging
engagement and creating a socially supportive experience.</p>
        <p>
          These styles were selected to instantiate a contrast along the afective expressiveness dimension,
which is frequently discussed in the literature on empathic and persuasive communication. Prior studies
suggest that user preferences regarding afective intensity may vary significantly across individuals
and contexts [
          <xref ref-type="bibr" rid="ref2 ref4">4, 2</xref>
          ]. While some users may feel more comfortable with emotionally neutral and
to-thepoint communication, others respond more positively to expressive and socially engaging behavior.
The proposed framework, however, is not tied to any specific pair of styles. It is generalizable to
any set of well-defined communicative behaviors that difer in tone, formality, emotional warmth, or
other stylistic dimensions. The selection of styles can be informed by theoretical models (e.g., social
presence, communication accommodation theory) or derived empirically through design and user
testing, depending on the target application.
        </p>
        <p>The Dialogue Manager generates system utterances using GPT-4 via the OpenAI ChatGPT API. For
each user input, a prompt is composed that instructs the model to reformulate the response following
the selected style. The prompt template is:
“Respond to the following user utterance in a [STYLE] manner, as defined below.</p>
        <p>Style definition: [STYLE DEFINITION]</p>
        <p>User utterance: ’[USER INPUT]’”</p>
        <p>Here, [STYLE] is replaced by the current style, while [STYLE DEFINITION] provides a textual
description to condition the model appropriately. The [USER INPUT] field contains the transcribed user
utterance. This design allows the system to flexibly generate consistent, stylistically adapted responses
to a wide range of inputs while maintaining semantic coherence and task relevance.</p>
        <p>By decoupling content planning from style selection and leveraging a generative language model with
style conditioning, the Dialogue Generation module supports naturalistic and adaptive conversations,
reinforcing the capability of the system to sustain engagement throughout the interaction.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Preliminary Evaluation</title>
      <p>To explore whether large language models (LLMs) are capable of consistently producing utterances
that reflect distinct communicative styles, we conducted a preliminary evaluation based on sentiment
analysis. Specifically, we aimed to assess whether stylistic prompts can elicit systematic variations in
the afective content of generated responses.</p>
      <p>We selected a set of 10 representative user utterances that may occur during an interaction with a
socially assistive agent. For each utterance, we generated two responses using the prompting strategy
described in this paper, instructing the LLM (GPT-4) to rephrase the system’s reply in two stylistic variants.
These styles were chosen to reflect qualitatively diferent approaches to empathy and encouragement
in assistive dialogue.</p>
      <p>To assess the afective and expressive characteristics of the generated responses, we applied sentiment
analysis using the VADER (Valence Aware Dictionary and sEntiment Reasoner) module from NLTK
[19, 20]. These tools provide complementary perspectives on the emotional and stylistic properties of
language. In particular, the following VADER items were computed:
• Polarity: a normalized polarity value between −1 and +1, summarizing the overall sentiment of
the sentence based on lexical features and intensifiers;
• Positive, Neutral, and Negative: the proportion of text perceived as expressing positive, neutral,
or negative sentiment, respectively, with values ranging from 0 to 1 and summing to 1.</p>
      <p>In addition to these sentiment metrics, we also calculated the number of words for each response to
evaluate diferences in verbosity between styles.</p>
      <p>By analyzing these metrics across the two stylistic conditions (Neutral and Enthusiastic), we aim to
determine whether the stylistic constraints embedded in the prompt lead to consistent and measurable
diferences in the generated responses. This analysis provides a preliminary assessment of the Dialogue
Generation module to implement interaction style in a controlled and interpretable manner.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <p>positive component did not difer significantly between styles, Enthusiastic responses were rated less
Neutral (p &lt; 0.01), reflecting their more expressive nature. Moreover, the negative sentiment was always
rated barely close to zero for both styles. Additionally, Enthusiastic responses were significantly longer
in terms of word count (p &lt; 0.0001). This aspect may reveal that the Enthusiastic phrasing result to be
more verbose text than the other.</p>
      <p>These findings suggest that stylistic instructions embedded in the prompt successfully induced
measurable and coherent variations in sentiment and expressiveness, supporting the use of prompting
as a viable mechanism for modulating interaction style in adaptive agents.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>This work introduced a modular framework for the real-time modulation of interaction style in
assistive human-agent communication. The proposed system integrates multimodal emotion recognition,
reinforcement learning, and prompting strategies to enable adaptive, afect-sensitive behavior in large
language model (LLM)-based dialogue agents.</p>
      <p>Interaction styles are selected through a Thompson Sampling algorithm, which optimizes the selection
policy based on continuous user afect monitoring. Each style is operationalized via prompt conditioning
of an LLM, ensuring that the generated responses are both contextually appropriate and stylistically
consistent.</p>
      <p>Preliminary evaluation focused primarily on the generative capabilities of the dialogue module,
assessing the extent to which prompt-based conditioning can modulate style in LLM-driven responses.
While the results confirmed significant and coherent stylistic variations, the study did not include
real-time trials with end-users in assistive scenarios. As such, the efectiveness of the full adaptive
framework, including the closed-loop integration of emotion recognition, style selection, and dialogue
generation, remains to be validated in long-term, ecologically valid interactions.</p>
      <p>Future work will therefore address these limitations by: (i) expanding the repertoire of communicative
styles and afective adaptation strategies; (ii) implementing a multimodal emotion recognition pipeline;
and (iii) conducting controlled and longitudinal user studies in real-world assistive contexts to evaluate
the impact of adaptive style modulation on user engagement, trust, and task performance. These
steps will enable a more comprehensive validation of the proposed framework and its potential for
deployment in practical assistive human–agent interaction scenarios.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This work was partially supported by the Italian Ministry of Research, under the complementary actions
to the NRRP “Fit4MedRob - Fit for Medical Robotics” Grant PNC0000007, (CUP: B53C22006990001) and
partially by Next Generation EU – “Age-It – Ageing Well in an Ageing Society” project (PE0000015),
National Recovery and Resilience Plan (NRRP).</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used generative AI tools (specifically, OpenAI’s GPT-4)
to assist with grammar and spelling checks.
reliable in the context of interactive media? a new metric to analyse their performance, in:
EmotionIMX: Considering Emotions in Multimedia Experience (ACM IMX 2022 Workshop), 2022.
[17] T. Zhang, Feel-good thompson sampling for contextual bandits and reinforcement learning, SIAM</p>
      <p>Journal on Mathematics of Data Science 4 (2022) 834–857.
[18] R. Molle, C. Tamantini, C. Lauretti, E. M. Romano, L. Zollo, An online reinforcement learning
method to improve control adaptability in robot-aided rehabilitation, Engineering Applications of
Artificial Intelligence 161 (2025) 112248.
[19] C. Hutto, E. Gilbert, Vader: A parsimonious rule-based model for sentiment analysis of social
media text, in: Proceedings of the international AAAI conference on web and social media,
volume 8, 2014, pp. 216–225.
[20] A. Borg, M. Boldt, Using vader sentiment and svm for predicting customer response sentiment,
Expert Systems with Applications 162 (2020) 113746.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>B. J.</given-names>
            <surname>Fogg</surname>
          </string-name>
          ,
          <article-title>A behavior model for persuasive design</article-title>
          ,
          <source>in: Proceedings of the 4th international Conference on Persuasive Technology</source>
          ,
          <year>2009</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>7</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>R.</given-names>
            <surname>Orji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. F.</given-names>
            <surname>Tondello</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. E.</given-names>
            <surname>Nacke</surname>
          </string-name>
          ,
          <article-title>Personalizing persuasive strategies in gameful systems to gamification user types</article-title>
          ,
          <source>in: Proceedings of the 2018 CHI conference on human factors in computing systems</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>14</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Masthof</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Vassileva</surname>
          </string-name>
          ,
          <article-title>Personalized persuasion for behaviour change</article-title>
          ,
          <source>Personalized HumanComputer Interaction</source>
          , Walter de Gruyter GmbH Co KG (Ed.) (
          <year>2023</year>
          )
          <fpage>205</fpage>
          -
          <lpage>235</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>T. W.</given-names>
            <surname>Bickmore</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. W.</given-names>
            <surname>Picard</surname>
          </string-name>
          ,
          <article-title>Establishing and maintaining long-term human-computer relationships, ACM Transactions on Computer-Human Interaction (TOCHI) 12 (</article-title>
          <year>2005</year>
          )
          <fpage>293</fpage>
          -
          <lpage>327</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Ö. N.</given-names>
            <surname>Yalçın</surname>
          </string-name>
          ,
          <article-title>Empathy framework for embodied conversational agents</article-title>
          ,
          <source>Cognitive Systems Research</source>
          <volume>59</volume>
          (
          <year>2020</year>
          )
          <fpage>123</fpage>
          -
          <lpage>132</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>C.</given-names>
            <surname>Tamantini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. P.</given-names>
            <surname>Langlois</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. De Winter</surname>
            ,
            <given-names>P. H. A.</given-names>
          </string-name>
          <string-name>
            <surname>Mohamadi</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Beckwée</surname>
            , E. Swinnen,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Verstraten</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Vanderborght</surname>
          </string-name>
          , L. Zollo,
          <article-title>Promoting active participation in robot-aided rehabilitation via machine learning and impedance control, Frontiers in digital health 7 (</article-title>
          <year>2025</year>
          )
          <fpage>1559796</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Purington</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. G.</given-names>
            <surname>Taft</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sannon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. N.</given-names>
            <surname>Bazarova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. H.</given-names>
            <surname>Taylor</surname>
          </string-name>
          , ”
          <article-title>alexa is my new bf” social roles, user satisfaction, and personification of the amazon echo</article-title>
          ,
          <source>in: Proceedings of the 2017 CHI conference extended abstracts on human factors in computing systems</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>2853</fpage>
          -
          <lpage>2859</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>R. A.</given-names>
            <surname>Calvo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. D</given-names>
            <surname>'Mello</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Gratch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kappas</surname>
          </string-name>
          ,
          <source>The Oxford handbook of afective computing</source>
          , Oxford University Press,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>G.</given-names>
            <surname>Beraldo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Tamantini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Umbrico</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Orlandini</surname>
          </string-name>
          ,
          <article-title>Fostering behavior change through cognitive social robotics</article-title>
          ,
          <source>in: International Conference on Social Robotics</source>
          , Springer,
          <year>2024</year>
          , pp.
          <fpage>279</fpage>
          -
          <lpage>291</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Trivedi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Pant</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Shah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sonik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Agrawal</surname>
          </string-name>
          ,
          <article-title>Speech to text and text to speech recognition systems-areview</article-title>
          ,
          <source>IOSR J. Comput. Eng</source>
          <volume>20</volume>
          (
          <year>2018</year>
          )
          <fpage>36</fpage>
          -
          <lpage>43</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>B. C.</given-names>
            <surname>Ko</surname>
          </string-name>
          ,
          <article-title>A brief review of facial emotion recognition based on visual information</article-title>
          , sensors
          <volume>18</volume>
          (
          <year>2018</year>
          )
          <fpage>401</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>F. Z.</given-names>
            <surname>Canal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. R.</given-names>
            <surname>Müller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. C.</given-names>
            <surname>Matias</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. G.</given-names>
            <surname>Scotton</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. R. de Sa Junior</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Pozzebon</surname>
            ,
            <given-names>A. C.</given-names>
          </string-name>
          <string-name>
            <surname>Sobieranski</surname>
          </string-name>
          ,
          <article-title>A survey on facial emotion recognition techniques: A state-of-the-art literature review</article-title>
          ,
          <source>Information Sciences 582</source>
          (
          <year>2022</year>
          )
          <fpage>593</fpage>
          -
          <lpage>617</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>R.</given-names>
            <surname>Cittadini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Tamantini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Scotto di Luzio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Lauretti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zollo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Cordella</surname>
          </string-name>
          ,
          <article-title>Afective state estimation based on russell's model and physiological measurements</article-title>
          ,
          <source>Scientific reports 13</source>
          (
          <year>2023</year>
          )
          <fpage>9786</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>C.</given-names>
            <surname>Tamantini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. L.</given-names>
            <surname>Cristofanelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Fracasso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Umbrico</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Cortellessa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Orlandini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Cordella</surname>
          </string-name>
          ,
          <article-title>Physiological sensor technologies in workload estimation: A review</article-title>
          ,
          <source>IEEE Sensors Journal</source>
          (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>P. V.</given-names>
            <surname>Paiva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. J.</given-names>
            <surname>Ramos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gavrilova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Carvalho</surname>
          </string-name>
          ,
          <article-title>Skelett-skeleton-to-emotion transfer transformer</article-title>
          , IEEE Access (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>E. V.</given-names>
            <surname>Sampaio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Lévêque</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. P.</surname>
          </string-name>
          da Silva,
          <string-name>
            <given-names>P.</given-names>
            <surname>Le</surname>
          </string-name>
          <string-name>
            <surname>Callet</surname>
          </string-name>
          ,
          <article-title>Are facial expression recognition algorithms</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>