<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Supporting mental-health monitoring of patients with Social Robots⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Rita Francese</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Edmondo Nicolò De Simone</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Computer Science Department, University of Salerno</institution>
          ,
          <addr-line>Via Giovanni Paolo II, Fisciano (SA)</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This study presents a preliminary evaluation of a system for mental health support based on the Furhat social robot integrated with a conversational AI model. The system enables natural-language interactions in Italian and automatically structures dialogues into JSON format for potential clinical use. A user study with 19 participants (ages 18-70), none with diagnosed mental health conditions, was conducted using standardized tools such as the System Usability Scale (SUS), Godspeed Questionnaire, and Net Promoter Score (NPS). Results showed a statistically significant improvement in participants' self-reported mood after the interaction (  = 0.015 with a medium Clif's Delta efect size), along with favorable perceptions of clarity, neutrality, and engagement. The Net Promoter Score (47.4) indicates positive acceptance, although the overall SUS score (54.6) highlights areas for usability improvement. Qualitative feedback emphasized the value of the robot's non-judgmental perception but also suggested enhancements in expressiveness, personalization, and emotional recognition. Future work will focus on integrating standardized clinical interview protocols into the dialogue process, enabling structured data collection to support continuous monitoring and early relapse detection. This work lays the foundation for responsibly introducing social robots as complementary tools in clinical mental health care.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Social Robot</kwd>
        <kwd>Mental Health</kwd>
        <kwd>Large Language Models</kwd>
        <kwd>User Study</kwd>
        <kwd>Furhat</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The growing prevalence of mental health disorders and the limitations of traditional healthcare systems
in providing continuous, personalized care have prompted the exploration of innovative technological
solutions. such as using Deep Learning [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] techniques or gait analysis. Among these, social robots
ofer a promising avenue for enhancing mental health monitoring by enabling natural, empathetic
interactions and scalable data collection [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ].
      </p>
      <p>This paper presents a preliminary study which explores the use of the social robot Furhat, integrated
with conversational AI (ChatGPT 4.5), to support the continuous monitoring of patients’ mental
wellbeing. The research addresses the following question:</p>
      <p>RQ: Is it possible to use a social robot as a tool for supporting the evaluation and monitoring
of mental health?</p>
      <p>
        To answer this question, the study introduces a system capable of conducting natural conversations
with users in Italian, automatically generating structured JSON outputs that segment dialogues for
clinical review. The system was evaluated with 19 participants, using standardized tools such as the
System Usability Scale (SUS), the Goodspeed scale for robot perception, and the Net Promoter Score
(NPS). These methods have been widely validated for assessing the usability and perception of socially
assistive robots in healthcare [
        <xref ref-type="bibr" rid="ref4 ref5 ref6">4, 5, 6</xref>
        ].
      </p>
      <p>
        This contribution is in alignment with the goals of the Research Projects of Significant National
Interest (PRIN) 2022 PNRR SPECTRA project [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], focused on using AI for supporting patients with
schizophrenia through Natural Language Processing [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], gait analysis [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], and emotion detection
techniques [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. This preliminary study lays the foundation for future developments in continuous,
explainable, and ethically sound mental health monitoring solutions.
      </p>
      <p>The paper is structured as follows. In Section 2 we discuss the background of this study; Section 3
presents the proposed system; Section 4 describes the evaluation planning while Section 5 reports and
discusses the results. Finally Section 6 concludes the paper with final remarks and future work.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Background</title>
      <p>The integration of social robots into mental health support is based on prior research in human–robot
interaction, socially assistive technologies, and conversational AI. This section outlines the technical
features of the Furhat platform and reviews related work that highlights both the potential and current
limitations of social robots in healthcare and mental well-being contexts.</p>
      <sec id="sec-2-1">
        <title>2.1. The Furhat Social Robot</title>
        <p>
          Furhat is a tabletop humanoid robot developed by Furhat Robotics in Sweden (est. 2014) characterized
by a back-projected face capable of expressive facial animation, eye gaze, and head movements, enabling
human-like conversational interaction across multiple languages and modalities [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. It is programmable
via a modular SDK (including Kotlin support), integrates with speech recognition/synthesis engines,
and can embed large language models to deliver adaptive, multimodal dialogue [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ].
        </p>
        <p>
          Furhat has been deployed in a variety of domains including healthcare, education, customer service,
and mental health support. As an example, applications such as the MIA-PROM project use Furhat to
guide patients through health questionnaires in clinical settings, improving accessibility and engagement
[
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. Other initiatives, such as the Translational Research Centre for Digital Mental Health (TRC-DMH),
are exploring Furhat’s expressivity and conversational presence in emotionally sensitive contexts
involving special needs and mental health support [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ].
        </p>
        <p>
          Within academic research, Furhat enables embodied conversational agents like FurChat, which
combines LLM-based dialogue generation with real-time facial expression control, demonstrating its
suitability for naturalistic, socially rich interaction [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. These features position Furhat as a compelling
platform for investigating social robotics in mental health contexts.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Related Work</title>
        <p>
          Socially assistive robots (SARs), social robots designed to aid users through emotional and social
support, have been widely investigated as interventions in mental health and well-being contexts [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ].
A systematic review of SAR deployments in healthcare settings, including therapy, monitoring, and
patient engagement, documents over 279 real-world applications across 33 countries, highlighting their
growing adoption and wide-ranging use cases [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ].
        </p>
        <p>Another systematic review focusing on mental health outcomes reported that most studies involved
older adults with dementia and relied on small usability trials, revealing a need for more diverse
populations and rigorous measurement frameworks [16]. These findings underscore limitations in
generalizability and the necessity of broader, longitudinal studies.</p>
        <p>Empirical studies with conversational SARs provide evidence for their potential. For instance, the
Ryan robot, which integrates multimodal emotion recognition, was shown to improve mood and
engagement among older adults in pilot studies, with the empathic variant deemed more likable and
efective [ 17]. Likewise, research deploying a conversational robot to deliver internet-based Cognitive
Behavioral Therapy (iCBT) (Ryan platform) demonstrated outcomes comparable to traditional
humanled interventions over multiple weeks [18].</p>
        <p>Recent studies emphasize the value of social robots for community-oriented mental health tools,
including mood visualization and peer support frameworks [19]. Furhat-based systems have specifically
been employed to assess stress, anxiety, and depression via conversational AI in non-judgmental
environments [20]. However, user studies such as one conducted with older adults revealed that
perceptions of anthropomorphism and intimacy vary, and acceptance depends on design features and
context [21].</p>
        <p>In summary, the literature indicates strong promise for SARs in mental-health contexts, but also
highlights the importance of explainability, cultural sensitivity, and rigorous evaluation, especially in
clinically relevant populations.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. The proposed system</title>
      <sec id="sec-3-1">
        <title>The system is composed of the following three distinct layers (Figure 2).</title>
        <p>• Presentation Layer: The physical and expressive interface is provided by the Furhat robot, which
delivers face projection, gaze control, and audio interaction capabilities. This layer manages the
audio and text exchange between the user and the system.
• Dialogue-Processing Layer: Incoming speech is converted into text via a Speech-to-Text service.</p>
        <p>The textual input is then processed by ChatGPT 4.5 (via API), which is guided by a structured
prompt reported in Listing 3 to ensure safe, empathetic, and contextually relevant dialogue.
The generated text is sent back to the Presentation Layer for speech synthesis, and optionally
accompanied by audio cues or emotional expression.
• Data Logging Layer: All messages and corresponding audio are captured, segmented, and
structured in JSON format. This dual-channel logging (User/Robot) enables precise turn segmentation,
preserves the conversation context, and facilitates later analysis.</p>
        <p>The prompt plays a crucial role in guiding the conversational agent’s behaviour. It defines the role
of the robot (a friendly and understanding psychologist), the type of questions to be asked, and the
boundaries of the conversation (avoiding sensitive topics such as schizophrenia). The prompt instructs
the agent to begin each interaction with an empathetic greeting, adapt follow-up questions based on
user interest, and maintain a natural, fluid dialogue.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Evaluation planning</title>
      <p>The evaluation combines quantitative and qualitative measures to explore user experience, acceptance,
and afective impact. The following research question guided the study:</p>
      <p>RQ: Is it possible to use a social robot as a tool for supporting the evaluation and monitoring
of mental health?</p>
      <p>The goal was to examine participants’ attitudes towards the use of a social robot for mental health
monitoring, assessing both functional and afective responses.</p>
      <p>The evaluation design considered the following dimensions:
• Afective change: Measuring shifts in self-reported emotional state before and after the interaction.
• Usability: Assessing ease of use, learnability, and overall user-friendliness.
• Perception of the robot: Evaluating anthropomorphism, animacy, likeability, perceived intelligence,
and perceived safety.
• Satisfaction and acceptance: Measuring willingness to recommend the system.
• Qualitative impressions: Collecting open-ended feedback to capture nuances not covered by
standardized scales.</p>
      <sec id="sec-4-1">
        <title>To assess these dimensions, the following tools were adopted:</title>
        <p>
          • System Usability Scale (SUS) [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]: A widely used 10-item questionnaire that yields a score from 0
to 100, where higher scores indicate better perceived usability.
• Godspeed Questionnaire Series [22]: Five subscales (Anthropomorphism, Animacy, Likeability,
        </p>
        <p>Perceived Intelligence, Perceived Safety) rated on 5-point Likert items.
• Net Promoter Score (NPS): A single-item metric asking participants how likely they are to
recommend the system to others, producing a score from -100 to +100.</p>
      </sec>
      <sec id="sec-4-2">
        <title>Procedure. The evaluation was structured into three sequential phases:</title>
        <p>1. Pre-interaction: Collection of demographic information and baseline afective self-assessment.
2. Interaction: Voice-based conversation with the Furhat robot in Italian, guided by the supportive
prompt described in Section 3. The session had no fixed duration to allow natural dialogue flow.
Listing 1: The conversation prompt.
val chatbot = OpenAIChatbot(
"""
You are a friendly and understanding psychologist named $name.</p>
        <p>The following is a conversation in Italian between $name and a Person.
$name will ask a series of general questions to get to know the Person
better and encourage them to talk.</p>
        <p>The questions will aim to explore the P e r s o n s interests, experiences,
and thoughts. $name will carefully avoid any mention of topics related
to schizophrenia.</p>
        <p>The format of the conversation will be: question, answer, question,
answer, etc.
$name must respond in context to the answers, deepening topics that
elicit more emphasis or interest from the Person, without repeating the
same questions.</p>
        <p>The conversation will always start with "How are you?" to create empathy
and make the Person feel at ease.</p>
        <p>The goal is to create a smooth and natural dialogue that allows the
Person to express themselves freely and share their thoughts and feelings.
If the Person shows particular interest or emphasis in a specific topic,
$name will follow up and deepen that topic with further relevant questions.
Otherwise, $name will continue with new questions to cover a wide range
of topics.</p>
        <p>Examples of follow-up questions to deepen answers include:
- If the Person talks about a book, $name could ask:
"What did you like most about that book?"
or "Do you have other books by the same author that you enjoy?"
- If the Person mentions a favorite film, $name could ask:
"What is your favorite scene in that film?"
or "Are there other similar films you enjoy?"
- If the Person talks about a hobby, $name could ask:
"How long have you been practicing this hobby?"
or "What made you start this hobby?"
"""
)
3. Post-interaction: Completion of the SUS, Godspeed, and NPS questionnaires, as well as the post-test
afective self-assessment and open-ended feedback.</p>
        <p>This mixed-methods approach ensured that we could triangulate standardized quantitative scores
with qualitative user narratives, providing a comprehensive understanding of how participants perceived
and responded to the robot in the context of mental health evaluation and monitoring.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results</title>
      <p>Participants. The study involved 19 participants, recruited via convenience sampling. The sample had
a balanced gender distribution: 9 males, 9 females, and 1 participant who preferred not to disclose. Most
participants were aged 18–25 years (11 participants), followed by 26–35 years (4 participants), 51 years
and older (3 participants), and 36–50 years (1 participant)) see Figure 3. No specific exclusion criteria were
applied, allowing anyone interested to participate in the experiment. The sample included university
students (not from computer science or related fields), working adults from various professions, and
older adults with diverse levels of technological experience. We also asked participants information
about their previous knowledge of social robots. 8 reported having no prior knowledge of social robots,
7 reported having superficial knowledge, and 4 reported having good knowledge of the topic.</p>
      <sec id="sec-5-1">
        <title>5.1. Quantitative results</title>
        <p>Mood Changes Before and After Interaction Table 1 summarizes the descriptive statistics of
participants’ mood ratings related to the question: ”How do you feel about talking to a robot?” before
(PRE) and after (POST) interacting with the Furhat robot. The mean mood score increased from 3.58
(SD = 0.77) before the interaction to 4.11 (SD = 0.57) after the interaction, while the median remained
stable at 4 for both time points. The minimum score also increased from 2 to 3, indicating that the least
positive responses improved following the interaction. The maximum score remained at 5, suggesting
that the most positive responses were already at the top of the scale.</p>
        <p>Figure 4 presents the boxplots of the mood scores, showing a general upward shift after interaction.
Notably, the interquartile range (IQR) decreased post-interaction, suggesting that participants’ responses
became more homogeneous and clustered towards higher positive ratings.</p>
        <p>The Wilcoxon signed-rank test (Table 2) indicated a statistically significant increase in mood scores
after interacting with the robot ( = 4.5,  = 0.015), confirming that the shift observed in the
descriptive statistics is unlikely due to chance. Clif’s Delta, an efect size measure for paired data, was
-0.39, indicating a medium negative efect of the interaction on participants’ mood. The 95% confidence
interval for the efect size (0.048–0.651) suggests that the positive impact is robust across the sample.</p>
        <p>These results indicate that the interaction with Furhat not only increased participants’ mean mood
but also led to more consistent positive responses, with a negative statistically significant and
mediumsized efect. Even participants who initially reported lower mood ratings showed improvement after
the interaction, highlighting the potential of social robots to positively influence afective states in
short-term engagements.</p>
        <p>Godspeed. The Godspeed Questionnaire assessed participants’ perceptions of the Furhat robot across
ifve dimensions: Perceived Intelligence, Likeability, Animacy, Anthropomorphism, and Safety. Overall,
results indicate a positive reception, with particularly strong evaluations in cognitive and afective
aspects.</p>
        <p>Figure 5 illustrates the distribution of scores across the five dimensions of the Godspeed questionnaire.
Participants rated the system particularly high on Intelligence and Likeability, with median values
close to the upper end of the scale, suggesting that the conversational AI and robot were perceived as
both competent and engaging. Anthropomorphism and Animacy scored moderately, reflecting that
while the system was viewed as somewhat lifelike, participants did not strongly anthropomorphize it,
consistent with the design focus on clarity and neutrality rather than human mimicry. Safety ratings
showed wider variability, with some low outliers, indicating that although most participants perceived
the interaction as safe, a minority expressed reservations. The red dotted line at score 3 marks the
neutral threshold, showing that all median scores remained above neutrality, highlighting generally
favorable impressions.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. System Usability Scale (SUS) Results</title>
        <p>The System Usability Scale (SUS) was administered to all 19 participants to assess the perceived usability
of the Furhat-based monitoring system. The overall SUS score averaged 54.61 (SD = 4.88; range
47.5–67.5), with a median of 55. This value falls below the conventional acceptability threshold of 68,
suggesting that while participants generally managed to use the system efectively, its usability remains
at a prototype stage and requires further refinement.</p>
        <p>At the item level, several strengths emerged. Participants rated the clarity of the robot’s responses
very highly (M = 4.74/5), followed by overall ease of use (M = 4.63/5), integration of functionalities (M
= 4.42/5), willingness to recommend the system (M = 4.37/5), and confidence in using it (M = 4.21/5).
Reversed items also showed positive evaluations, with satisfactory ratings for absence of unnecessary
complexity (M = 4.21/5), system not cumbersome (M = 4.16/5), and autonomy in use (M = 3.58/5). No
item scored critically low, and the overall mean across items was 4.2/5, pointing to a generally favorable
user experience.</p>
        <p>The discrepancy between the relatively low global SUS score and the positive ratings of individual
items can be partly explained by the prevalence of neutral responses (score = 3 on a 1–5 scale), a pattern
consistent with the well-known central tendency bias. Rather than signaling dissatisfaction, this suggests
cautious or moderate evaluations from participants, possibly reflecting the novelty of interacting with a
social robot for mental health purposes.</p>
        <p>In summary, despite a SUS score below the standard benchmark, the detailed analysis indicates good
basic usability, particularly with respect to clarity of communication and ease of use. These findings
highlight opportunities for targeted improvements aimed at reducing user hesitation and enhancing
overall acceptance in future iterations of the system.</p>
      </sec>
      <sec id="sec-5-3">
        <title>5.3. Net Promoter Score (NPS) Results</title>
        <p>To assess participants’ willingness to recommend the system to others, we collected ratings on a scale
from 0 (not at all likely) to 10 (extremely likely). According to the standard NPS methodology, scores of
9–10 are classified as Promoters, scores of 7–8 as Passives, and scores of 0–6 as Detractors.</p>
        <p>Out of 19 participants, none were detractors, while 9 (47.4%) were promoters and 10 (52.6%) were
passives. This distribution resulted in a Net Promoter Score (NPS) of 47.4, which falls into the positive
range typically interpreted as “good” for technology acceptance studies. The absence of detractors and
the majority of respondents falling into either the promoter or passive category indicates an overall
favorable perception of the system, with strong potential for recommendation in mental health contexts.</p>
      </sec>
      <sec id="sec-5-4">
        <title>5.4. Qualitative results</title>
        <p>In addition to quantitative measures, participants provided open-ended feedback on their interaction
with the Furhat robot. Their comments were analyzed thematically across three guiding questions:
strengths of the interaction, potential usefulness of Furhat for other people with mental health conditions,
and suggestions for improvement.</p>
        <p>Strengths of the Interaction. A recurring theme was the clarity and fluidity of communication .
Several participants described the interaction as “the communication was simple and fast” or “the dialogue
felt natural and fluid, as if talking to a person” . The non-judgmental and neutral stance of Furhat
was also appreciated—one participant noted that its “constant and neutral expressiveness was reassuring”.
This aligns with findings from a systematic review indicating that social robots’ non-judgmental nature
can be particularly important for individuals who may initially avoid therapy.</p>
        <p>Moreover, participants emphasized the robot’s ability to create a low-pressure environment for
self-expression: “Some people fear being judged when talking about their mental health. Furhat, as an
artificial entity, reduces the possibility of feeling inadequate, stimulating greater openness and sincerity.”</p>
        <p>Emotional stability emerged as another strength: interacting with Furhat avoided emotional
destabilization that might occur with a human professional—“With a professional, exposing emotions could
destabilize me, which does not happen with Furhat for obvious reasons.” This echoes insights from studies
on socially assistive robots used in mental health contexts, where empathic presence supports stable
emotional engagement.</p>
      </sec>
      <sec id="sec-5-5">
        <title>Usefulness for People with Mental Health Conditions Many participants believed Furhat could</title>
        <p>serve as a valuable tool. Some highlighted its role in stimulating communication: “it could give great
help in stimulating communication”. Others envisioned clinical applications: “if used alongside a doctor,
it could be an excellent tool for evaluation or patient stimulation”. This is consistent with systematic
survey findings that social robots like NAO, CRECA, or Paro have been explored for counseling and
motivational interviewing tasks.</p>
        <p>Furhat’s consistent, non-judgmental demeanor was seen as supportive: “With Furhat I established a
constant, non-judgmental dialogue, which can help people feel listened to and understood.” Additionally,
practical benefits were noted, such as easing clinical load and improving care access— “It could lighten
the clinical burden and facilitate access to care, especially where staf is limited.” These observations
are aligned with research showing companion-like robots foster therapeutic alliance and improve
psychological well-being in longer-term deployments.</p>
        <p>Suggestions for Improvement The most frequently proposed improvement was to make Furhat
appear more human-like. Suggestions ranged from cosmetic enhancements—“I would improve its
appearance by making it more human, maybe with hair” ; “its humanity, because it is still too mechanical” —to
expressiveness: “it should be able to perform other expressions with face and voice, so that conversations
do not become repetitive”. Research in HRI emphasizes that recognizable emotional expressions and
anthropomorphism significantly impact social engagement.</p>
        <p>Another recurrent theme concerned enhanced emotional and non-verbal recognition.
Participants suggested features such as “implementing recognition of body language and micro-expressions” or
“detecting the patient’s emotions through their tone of voice”. These align closely with afective computing
advances and the demonstrated benefits of empathic robots in recognizing and responding to emotions
to increase engagement and likability.</p>
        <p>Finally, some participants highlighted the value of memory and personalization, proposing “it
would be useful if Furhat could remember key elements of previous conversations” to support continuity and
richer long-term interaction. Cognitive robotics research supports the idea that memory systems—such
as narrative or episodic memory architectures—can help robots ground interactions over time and
respond more appropriately based on past exchanges.</p>
        <p>Summary Participants found key strengths in Furhat’s clarity, neutral stance, and accessibility. They
envisioned its utility for mental health settings—especially in contexts where non-judgmental, consistent
companionship is beneficial. To enhance Furhat’s impact, participants recommended improvements
in anthropomorphism (appearance and expressiveness), emotional perception, and memory-driven
personalization. These suggestions align well with ongoing research in socially assistive robotics and
afective human-robot interaction.</p>
      </sec>
      <sec id="sec-5-6">
        <title>5.5. Threats to Validity</title>
        <p>While the evaluation provided valuable insights into participants’ afective responses and perceptions
of the Furhat robot, several limitations should be considered when interpreting the results.
Internal Validity The study relied on self-reported measures of mood and subjective questionnaires,
which are susceptible to response bias and social desirability efects. Participants may have reported
more positive experiences due to perceived expectations or the novelty of interacting with a robot.
Additionally, the lack of a control condition prevents ruling out alternative explanations for mood
improvements, such as placebo efects or general engagement with any interactive system.
Construct Validity Although validated instruments (SUS, Godspeed, NPS) were used, some
constructs, particularly afective change, may not be fully captured by single-item mood ratings. Open-ended
feedback partially mitigates this limitation, but qualitative responses are inherently subjective and
depend on participants’ willingness to articulate their experiences.</p>
        <p>External Validity The sample consisted of 19 participants recruited via convenience sampling,
with a predominance of young adults and relatively balanced gender representation. This limits the
generalizability of findings to broader populations, including older adults, clinical populations, or
individuals with diferent cultural backgrounds. Furthermore, participants’ prior knowledge of social
robots varied, which could influence both engagement and reported perceptions.</p>
        <p>Ecological Validity The interaction took place in a controlled environment without long-term
followup. Therefore, it remains unclear whether the observed positive efects on mood or user acceptance
would persist over repeated sessions or in real-world settings, such as clinical practice.
Statistical Conclusion Validity Given the small sample size, the study is underpowered for detecting
small efect sizes. While the Wilcoxon signed-rank test indicated a significant mood increase and Clif’s
Delta suggested a medium efect, results should be interpreted cautiously. Larger sample studies are
necessary to confirm these preliminary findings.</p>
        <p>To address these threats, the study combined quantitative and qualitative measures, allowing for
triangulation of self-reported scores with participants’ narrative feedback. Future work should
incorporate larger and more diverse samples, longitudinal follow-ups, and control conditions to strengthen the
validity and reliability of conclusions regarding the use of social robots in mental health evaluation and
monitoring.</p>
      </sec>
      <sec id="sec-5-7">
        <title>5.6. Discussion</title>
        <sec id="sec-5-7-1">
          <title>The present study tried to answer the following research question:</title>
          <p>RQ: Is it possible to use a social robot as a tool for supporting the evaluation and monitoring
of mental health?</p>
          <p>Based on the results:
• Participants’ self-reported mood significantly improved after interacting with Furhat, as shown
by the Wilcoxon signed-rank test ( = 4.5,  = 0.015) and a medium-sized efect according to
Clif’s Delta (  = − 0.39).
• Participants reported favorable perceptions of anthropomorphism, animacy, likeability, perceived
intelligence, and safety (Godspeed), and a positive willingness to recommend the system (NPS =
47.4). Usability remains a prototype stage and requires further refinement.
• Qualitative feedback indicated that the robot provided a neutral, non-judgmental interaction,
encouraging participants to express their emotions freely and suggesting potential applicability
in mental health contexts.</p>
          <p>Together, these findings support the conclusion that social robots like Furhat can serve as efective tools
for supporting the evaluation and monitoring of mental health, at least in short-term, controlled settings.
The combination of quantitative and qualitative evidence indicates that such robots can capture afective
states, provide a safe and engaging environment, and be positively received by users, addressing both
functional and afective aspects of mental health assessment. The interaction with the user should still
be improved.</p>
          <p>Beyond the methodological limitations outlined in Section 5.7, this study raises broader issues related
to the responsible use of social robots in mental health contexts. While results show promise, deploying
such systems in real clinical practice requires careful attention to additional risks and safeguards.</p>
          <p>A first challenge concerns the transferability to clinical populations. Our participants did not
present mental health conditions, and their generally positive reception may difer substantially from
that of patients facing distress, stigma, or cognitive impairments. For clinical adoption, the system must
therefore be validated with diverse patient groups, accounting for specific needs and vulnerabilities.</p>
          <p>A second issue is the risk of misinterpretation and over-reliance. Although the robot can engage
in empathetic dialogue, it is not a substitute for professional judgment. If patients perceive it as a
therapeutic agent, there is a danger of reducing contact with human clinicians. Clear communication of
the robot’s supportive but non-therapeutic role is essential to avoid unrealistic expectations.</p>
          <p>Ethical and privacy concerns. Conversations with the robot may involve sensitive disclosures. These
data must be managed under strict security protocols and anonymization strategies, while ensuring
transparency about storage, usage, and clinician access. Patients should be empowered to control their
data and consent to its use, minimizing the risk of exploitation or loss of trust.</p>
          <p>Finally, the design of the system must balance engagement and authenticity. While participants
appreciated Furhat’s neutrality, some expressed a desire for greater emotional expressiveness and
personalization. However, excessive anthropomorphism could raise concerns about manipulation or
attachment to non-human agents. Future design should carefully calibrate the robot’s behavior to
support rapport without undermining authenticity or autonomy.</p>
          <p>In summary, the potential of social robots for mental health monitoring extends beyond technical
feasibility. Their real impact will depend on clinical validation, ethical safeguards, and a design philosophy
that positions them as complementary, transparent, and trustworthy partners in care.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion and Future Work</title>
      <p>This work presented a preliminary study on the use of the Furhat social robot, integrated with
conversational AI, to support mental health evaluation and monitoring. The findings indicate that participants
experienced significant mood improvements and valued the non-judgmental, supportive stance of the
robot, suggesting its promise as a tool for fostering open communication and engagement in mental
health contexts.</p>
      <p>However, the limited sample size and the absence of participants with diagnosed mental health
conditions mean that the results must be interpreted cautiously. The next step is to validate the system
in real-world clinical settings with diverse patient populations and over longitudinal deployments, in
order to better assess its eficacy and acceptability.</p>
      <p>Future work will focus on incorporating standardized clinical interview protocols into the
conversational process. This enhancement will enable systematic and clinically meaningful data collection,
which can then be analyzed to support continuous monitoring, early detection of relapse, and more
personalized mental health care. By aligning with established clinical practices and ensuring rigorous
ethical safeguards, the system has the potential to evolve into a practical, responsible, and efective tool
for supporting both patients and clinicians in mental health monitoring.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <sec id="sec-7-1">
        <title>The authors thank the anonymous participants.</title>
        <p>This research has been financially supported by the European Union NEXTGenerationEU project and
by the Italian Ministry of the University and Research MUR, a Research Projects of Significant National
Interest (PRIN) 2022 PNRR, project n. D53D23017290001 entitled "Supporting schizophrenia PatiEnts
Care wiTh aRtificiAl intelligence (SPECTRA)", Principal Investigator: Rita Francese.</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used Grammarly to perform a grammar and spelling
check.
[16] A. A. J. Scoglio, E. D. Reilly, et al., Use of social robots in mental health and well-being research:</p>
      <p>Systematic review, J. Medical Internet Research 21 (2019) e13322. doi:10.2196/13322.
[17] H. Abdollahi, M. H. Mahoor, et al., Artificial emotional intelligence in socially assistive robots for
older adults: A pilot study, IEEE Transactions on Cognitive and Developmental Systems (2022).
[18] F. Dino, R. Zandie, H. Abdollahi, et al., Delivering cognitive behavioral therapy using a
conversational social robot, arXiv preprint (2019).
[19] Human-Robot Interaction 2022 Workshop, Exploring social robot-based tools for community
mental health, in: Proceedings of ACM/IEEE HRI 2022, 2022.
[20] A. Nandanwar, et al., Assessing stress, anxiety, and depression with social robots via conversational
ai, ACM Press proceedings (2023).
[21] S. Thunberg, et al., Older adults’ perception of the furhat robot, ACM Proc. HRI (2022).
[22] C. Bartneck, D. Kulić, E. Croft, S. Zoghbi, Measurement instruments for the anthropomorphism,
animacy, likeability, perceived intelligence, and perceived safety of robots, International journal
of social robotics 1 (2009) 71–81.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R.</given-names>
            <surname>Francese</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. Delle</given-names>
            <surname>Cave</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Barra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ciccarelli</surname>
          </string-name>
          ,
          <string-name>
            <surname>G. De Simone</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Iannotta</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Iasevoli</surname>
          </string-name>
          ,
          <article-title>Deeptald: a system for supporting schizophrenia-related language and thought disorders detection with nlp models and explanations</article-title>
          ,
          <source>Multimedia Tools and Applications</source>
          (
          <year>2025</year>
          )
          <fpage>1</fpage>
          -
          <lpage>34</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Scoglio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. D.</given-names>
            <surname>Reilly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Gorman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. E.</given-names>
            <surname>Drebing</surname>
          </string-name>
          ,
          <article-title>Use of social robots in mental health and well-being research: systematic review</article-title>
          ,
          <source>Journal of medical Internet research 21</source>
          (
          <year>2019</year>
          )
          <article-title>e13322</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>C. S.</given-names>
            <surname>González-González</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Violant-Holz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. M.</given-names>
            <surname>Gil-Iranzo</surname>
          </string-name>
          ,
          <article-title>Social robots in hospitals: a systematic review</article-title>
          ,
          <source>Applied Sciences</source>
          <volume>11</volume>
          (
          <year>2021</year>
          )
          <fpage>5976</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>T. H.</given-names>
            <surname>Goodspeed</surname>
          </string-name>
          ,
          <article-title>Evaluation of human factors in robot design</article-title>
          ,
          <source>Human Factors</source>
          <volume>24</volume>
          (
          <year>1982</year>
          )
          <fpage>511</fpage>
          -
          <lpage>517</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Brooke</surname>
          </string-name>
          ,
          <article-title>Sus: A “quick and dirty” usability scale</article-title>
          ,
          <source>Technical Report</source>
          ,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>H.</given-names>
            <surname>Abdollahi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. H.</given-names>
            <surname>Mahoor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Zandie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Siewierski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. H.</given-names>
            <surname>Qualls</surname>
          </string-name>
          ,
          <article-title>Artificial emotional intelligence in socially assistive robots for older adults: a pilot study</article-title>
          ,
          <source>IEEE Transactions on Afective Computing</source>
          <volume>14</volume>
          (
          <year>2022</year>
          )
          <fpage>2020</fpage>
          -
          <lpage>2032</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>R.</given-names>
            <surname>Francese</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Iasevoli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Stafa</surname>
          </string-name>
          ,
          <article-title>The spectra project: Biomedical data for supporting the detection of treatment resistant schizophrenia</article-title>
          , in: International Conference on Human-Computer Interaction, Springer,
          <year>2024</year>
          , pp.
          <fpage>353</fpage>
          -
          <lpage>367</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>R.</given-names>
            <surname>Francese</surname>
          </string-name>
          , L. De Santis,
          <string-name>
            <given-names>F.</given-names>
            <surname>Iannotti</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Felice</surname>
          </string-name>
          ,
          <article-title>XAI for supporting gait analysis of patient with schizophrenia (</article-title>
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>L.</given-names>
            <surname>Galluccio</surname>
          </string-name>
          , L.
          <string-name>
            <surname>D'Errico</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Giordano</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Stafa</surname>
          </string-name>
          ,
          <article-title>Advancing eeg-based emotion recognition: Unleashing the power of graph neural networks for dynamic and topology-aware models</article-title>
          , in: 2024
          <source>International Joint Conference on Neural Networks (IJCNN)</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          . doi:
          <volume>10</volume>
          .1109/ IJCNN60899.
          <year>2024</year>
          .
          <volume>10650427</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Furhat</surname>
            <given-names>Robotics</given-names>
          </string-name>
          ,
          <source>Furhat social robot platform</source>
          ,
          <year>2025</year>
          .
          <article-title>Product and research overview, Furhat Robotics website</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>S.</given-names>
            <surname>Hillmann</surname>
          </string-name>
          , et al.,
          <article-title>Multimodal interactive assistance for the digital collection of patient-reported outcome measures (mia-prom</article-title>
          ),
          <year>2024</year>
          . Charité - Medizinische Universität Berlin project.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <article-title>Translational Research Centre for Digital Mental Health, Tung Wah College, Robotic support for mental health with furhat, 2024. TWC TRC-DMH website</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>N.</given-names>
            <surname>Cherakara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Varghese</surname>
          </string-name>
          , et al.,
          <article-title>Furchat: An embodied conversational agent using llms, combining open and closed-domain dialogue with facial expressions, arXiv preprint (</article-title>
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>D.</given-names>
            <surname>Feil-Seifer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Mataric</surname>
          </string-name>
          ,
          <article-title>Defining socially assistive robotics (</article-title>
          <year>2005</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>L.</given-names>
            <surname>Aymerich-Franch</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Ferrer</surname>
          </string-name>
          ,
          <article-title>Socially assistive robots' deployment in healthcare settings: a global perspective, arXiv preprint (</article-title>
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>