<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>User Engagement in a Triadic Human-Robot Interaction Setup: Incorporating Gaze, Head Pose, and Afective Cues</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Bahram Salamat Ravandi</string-name>
          <email>bahramsalamat@ait.gu.se</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>John Currie</string-name>
          <email>john.currie@ait.gu.se</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pierre Gander</string-name>
          <email>pierre.gander@ait.gu.se</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Robert Lowe</string-name>
          <email>robertlowe@ait.gu.se</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Engagement, Socially Assistive Robots, Afective Engagement, Social Engagement</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Forskningsgången 6 - 417 56, Gothenburg, Sweden, Department of Applied IT, University of Gothenburg</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Recent research in Human-Robot Interaction (HRI) has increasingly focused on understanding user engagement to enhance the overall user experience. This paper aims to develop a predictive model of user engagement within a triadic interaction loop involving three key entities: a human, a robot, and a task. To achieve this, we created a new dataset incorporating multimodal features, including facial landmarks, facial action units, head posture, and gaze. Engagement annotations were performed by two human annotators using a structured approach to ensure high-quality labeling. Building upon this dataset, we developed a deep learning-based predictive model of user engagement. The results demonstrate that the model efectively captures user engagement in the task-oriented HRI scenario, achieving a Mean Squared Error (MSE) of 0.0111 and an R² score of 0.8195, highlighting its accuracy and robustness. Additionally, a permutation feature importance analysis revealed that gaze, head pose, and facial expressions significantly contributed to the model's predictions across various levels of user engagement.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR
ceur-ws.org</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>
        The study of engagement has emerged as a response to the desire to create services, products,
and content that are tailored to user experience in order to engage users [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. In the field of
Human-Robot Interaction (HRI), engagement is a multifaceted concept with diverse definitions
in the literature [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Due to the diverse ways engagement is understood in the HRI field,
researchers have employed a wide range of metrics and features to measure it [
        <xref ref-type="bibr" rid="ref1 ref43">1, 43</xref>
        ]. In this
paper, we define ‘engagement’ as:
a quality of user experiences with technology that is characterized by challenge,
aesthetic and sensory appeal, feedback, novelty, interactivity, perceived control
      </p>
      <p>
        https://www.linkedin.com/in/bahramsalamat/ (B. S. Ravandi);
and time, awareness, motivation, interest, and afect ([
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], p. 949).
      </p>
      <p>
        One prominent area where engagement plays a critical role is Socially Assistive Robots
(SARs), which have been increasingly deployed in fields such as education and healthcare. For
instance, [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] developed two Machine Learning (ML) models to monitor long-term engagement
with SARs, particularly for children with autism spectrum disorder. By utilizing audio-visual
and performance data, they trained several ML algorithms and implemented re-engagement
strategies when engagement levels dropped below a certain threshold. Similarly, [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] proposed
an assistive robot designed to support Alzheimer’s patients through memory training exercises,
leveraging verbal and nonverbal communication to sustain user engagement. They identified
four distinct levels of engagement that robots must adapt to based on user performance, ensuring
a dynamic and responsive interaction.
      </p>
      <p>
        Various Machine Learning and Deep Learning models have been applied to diferent datasets
to improve accuracy and adaptability in engagement detection [
        <xref ref-type="bibr" rid="ref39">39</xref>
        ]. For instance, [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] used
the UE-HRI dataset to develop a 3D Convolutional Neural Network (CNN) model that detects
engagement based on video frame sequences. Similarly, [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] built a CNN model to identify
passive subjects in a four-way HRI interaction using facial and speech data. Other studies have
explored alternative approaches. For example, [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] utilized a Long Short-Term Memory (LSTM)
model on visual data, body pose, and facial features to detect disengagement in children with
learning dificulties, while [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] applied a Recurrent Neural Network (RNN) to model engagement
using behavioral and speech data from the UE-HRI dataset. Expanding on these methods,
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] compared diferent ML algorithms for engagement detection based on facial, audio, and
game performance features. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] evaluated several conventional model types, including Naïve
Bayes, K-nearest neighbors, support vector machines, neural networks, logistic regression,
random decision tree forests, and gradient-boosted decision trees. Among these,
gradientboosted decision trees emerged as the most successful, achieving the highest Area Under the
Receiver Operating Characteristic (AUROC) values. Further, [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] implemented a multimodal
active learning approach with Reinforcement Learning (RL) and LSTMs to detect child-robot
engagement, while [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] trained CNN and LSTM models to classify diferent engagement levels
in interactions with the TEGA robot. Other works focused on personalization and multimodal
engagement detection. [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] introduced CultureNet, a CNN-based model for personalized
engagement detection, while [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] combined CNN models for facial expression and body posture
analysis to classify engagement into positive, negative, and neutral categories. Additionally,
[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] trained an RNN model on the EASE dataset, incorporating videos, audio, and physiological
signals, using facial action units as input features. Finally, [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ] developed a Multi-Task Cascaded
Convolutional Neural Networks (MTCNN) model using facial landmarks and Histogram of
Oriented Gradients (HOG) features to detect engagement states in children.
      </p>
      <p>
        However, these models necessitate substantial amounts of training data, and there are only
a limited number of existing engagement datasets. Several datasets have been used in HRI
for engagement detection [
        <xref ref-type="bibr" rid="ref39">39</xref>
        ]. One well-known dataset is the UE-HRI dataset [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ], which
was collected from human interactions with the Pepper robot in a public space. This dataset
includes video, voice, sonar, and laser data, with engagement annotations primarily focused
on cues related to disengagement. A notable limitation of the UE-HRI dataset is that it focuses
solely on the binary presence or absence of disengagement, without capturing the full spectrum
or intensity of user engagement and emotional states. This limitation could hinder the model’s
ability to perceive more nuanced engagement levels during interactions.
      </p>
      <p>
        Another significant dataset is the TOGURO dataset, gathered from human interactions with
the NAO robot in public settings [
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ]. It contains video streams, as well as verbal and
nonverbal user behaviors, along with user position data. In addition to these engagement-specific
datasets, several emotion-based datasets are commonly used in research, including the Static
Facial Expressions in the Wild (SFEW), Facial Expression Recognition (FER2013), and AfectNet
[
        <xref ref-type="bibr" rid="ref39">39</xref>
        ]. However, these datasets rely heavily on facial expressions to detect afective engagement
and overlook other significant indicators of user engagement, such as pose and gaze.
      </p>
      <p>
        Due to the subjective and context-dependent nature of engagement, annotating engagement
is both time-consuming and challenging. While engagement annotation is typically performed
manually, alternative approaches have been explored. For instance, [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] combined self-reports
with expert annotations to establish ground truth, whereas [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] employed unsupervised methods
to categorize engagement into four patterns: approaching, interacting, leaving, and uninterested.
Following the establishment of this structured engagement dataset, a deep neural network
model was trained to detect user engagement.
      </p>
      <p>In this paper, we introduce an engagement dataset and engagement predictive model within
a triadic human-robot-task interaction. We constructed a dataset and implemented a rigorous
engagement annotation methodology to guarantee high-quality data labeling. The annotation
was informed by insights gained from data collection in [53], where observing patterns of
user behavior in interaction videos illuminated the correlations between various indicators
(such as facial expressions and gaze behavior) and engagement levels. The developed dataset
incorporates a variety of multimodal features, including facial landmarks, head pose, and gaze
direction.</p>
      <p>The following sections will detail the proposed general HRI setup, engagement annotation,
and modeling methodology.</p>
    </sec>
    <sec id="sec-3">
      <title>2. HRI Framework</title>
      <p>
        This study aims to assess human engagement in a triadic HRI setup by introducing an
engagement annotation framework and developing an engagement predictive model. This model
can potentially be used to enhance user experience by re-engaging users or increasing user
engagement via social and instructional feedback from the robot or by dynamically adjusting
task dificulty. Various gamification elements can be integrated into the setup, allowing users to
engage with the task through rewards and audiovisual feedback from both the robot and the task
itself [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. The feasibility of implementing such a function depends on real-time engagement
assessment.
      </p>
      <p>
        The interaction loop, as presented in [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], consists of six components:
1. Challenge Modulation: Adjusting the task’s dificulty based on the user’s engagement
state. If disengagement arises due to excessive dificulty, reducing the challenge can help
re-engage the user. Conversely, if the task is too easy, increasing its dificulty may enhance
engagement by ofering a more stimulating experience. According to Flow theory, an
individual can experience a state of deep satisfaction and immersion when there is an
optimal balance between the task’s challenge and their skill level [
        <xref ref-type="bibr" rid="ref32">32</xref>
        ].
2. Task State: This component involves providing information on human performance
considering the current state of the task. It functions as a gamification element, tracking
and displaying user progress and achievements, which can motivate sustained engagement
and improvement [
        <xref ref-type="bibr" rid="ref16 ref30 ref31">30, 31, 16</xref>
        ].
3. Action Selection: This component involves the use of touchscreen-based, verbal, or
mouse inputs for selecting actions. Providing users with hints, tips, or instructions as a
gamification element can result in higher engagement and improve users’ performance
[
        <xref ref-type="bibr" rid="ref16 ref40 ref41">16, 41, 40</xref>
        ].
4. Reward Feedback: The task provides direct feedback on the outcome of a specific action
taken by the user. This module is considered as a within-task gamification element in an
HRI setup [
        <xref ref-type="bibr" rid="ref39">39</xref>
        ].
5. Social Feedback: This component includes the robot’s verbal/nonverbal feedback to
encourage desired behaviors and acknowledge accomplishments [
        <xref ref-type="bibr" rid="ref33 ref34 ref35 ref36">33, 34, 35, 36</xref>
        ]. An
engagement assessment model can assist in determining appropriate robot responses.
Performance-based feedback may be particularly beneficial for tasks focused on
achievement, such as cognitive training or educational activities. Conversely, in scenarios where
fostering social connections is essential, such as companionship or social skills
development, emphasizing afective-based feedback can enhance user engagement with the
robot.
6. Engagement State: This component refers to possible inputs that can help determine
the user’s emotional state or level of engagement. These features may include facial
expressions, physiological signals (e.g., EEG, GSR, ECG), eye-tracking data, and body
movements captured by Kinect sensors.
      </p>
      <p>
        To efectively engage users, feedback and task dificulty adjustments must be adapted to
the user’s engagement state, estimated using an engagement estimation model. The literature
presents various adaptive strategies based on engagement, such as rule-based systems that
adjust according to user engagement levels [
        <xref ref-type="bibr" rid="ref23 ref37 ref38">38, 23, 37</xref>
        ], and Reinforcement Learning (RL)-based
policy learners, enabling more tailored adaptations for user engagement [
        <xref ref-type="bibr" rid="ref2 ref39 ref6">6, 2, 39</xref>
        ].
      </p>
    </sec>
    <sec id="sec-4">
      <title>Research Questions</title>
      <p>1. How can we efectively quantify users’ task engagement in a triadic HRI scenario involving
task-oriented interactions?
2. What multimodal features (e.g., Facial Action Units (FAUs), head postures, and gaze
directions) contribute to the assessment of user engagement?</p>
    </sec>
    <sec id="sec-5">
      <title>3. Methodology</title>
      <p>Fig. 1 presents the HRI setup designed for data collection, as originally introduced in [53]. The
dataset consists of video recordings capturing user interactions with a social robot, Furhat, during
a memory training task. The robot is positioned at an angled distance from both the user and the
screen. Each interaction lasts approximately 10–15 minutes, during which the robot provides
feedback aligned with the task’s outcome. Additionally, the robot establishes eye contact with
the user after delivering feedback in response to the user’s actions. Fifty-eight engineering
students (35 males, 23 females) from Koç University’s Electrical and Electronics Engineering
and Computer Engineering departments, aged 18 to 24 (M = 20, SD = 1.87), participated in the
data collection.</p>
      <sec id="sec-5-1">
        <title>3.1. Engagement Annotation</title>
        <p>In this setup, user task engagement can be defined through indicators, such as head and gaze
orientations, as well as facial expressions. A user’s head or gaze orientation serves as an
indicator of attention and engagement. When a user shifts their head or gaze away from the
screen, it suggests a decline in engagement. However, even when users maintain their gaze on
the screen, their level of engagement may vary between positive, negative, or neutral states.
Through qualitative analysis of the video data, seven distinct levels of user engagement were
established, which are categorized as follows:
• Level 1: Completely disengaged (looking away from the screen).
• Level 2: Occasional glances at the screen, lacking sustained focus.
• Level 3: User maintains attention on the screen but exhibits signs of distraction.
• Level 4: User maintains a steady focus on the screen while showing negative expressions
(e.g., frustration or disinterest).
• Level 5: User maintains a steady focus on the screen and shows neutral expressions
(neither positive nor negative).
• Level 6: User maintains a steady focus on the screen and shows positive expressions
(e.g., happiness or interest).
• Level 7: User maintains a steady focus on the screen and is highly engaged (displaying
strong positive emotions).</p>
        <p>Fig. 2 illustrates a flowchart of the annotation process. The participants’ videos were
segmented into sub-video clips based on similar engagement patterns, which were assessed
manually by a researcher. The researcher evaluated similarity by observing behavioral cues such
as facial expressions, body language, gaze direction, and vocal tone, identifying segments that
reflected consistent levels or patterns of engagement. This segmentation facilitates a detailed
temporal analysis of user engagement by capturing variations in attention and afective states
throughout the interaction. While this method may introduce a certain degree of noise or
inaccuracies, given the impracticality of labeling every video frame, it enables the extraction of
meaningful insights from the temporal patterns.</p>
        <p>Two researchers conducted the labeling independently. Firstly, they conducted an initial
round of annotation in which they sorted out the segmented video clips of each participant
into one of the seven predefined engagement levels. This initial labeling process is essential
for establishing reliable baseline engagement data. After initial labeling, clips within each
engagement level were further sorted out into three subcategories to increase annotation
granularity. The subcategories established within each engagement level were designed to
represent diferent degrees of engagement, essentially dividing the levels into three further
gradations, ranging from low to high within that engagement category. This approach allows
for a more nuanced understanding of user engagement by capturing subtle variations in user
behavior and emotional responses. To ensure robustness, annotators reviewed and compared
engagement sub-categories across all levels and resorted clips if necessary.</p>
        <p>
          Each annotator labeled 233 video clips, featuring interactions from 58 diferent participants.
To assess inter-rater reliability, we calculated a weighted Cohen’s Kappa coeficient [
          <xref ref-type="bibr" rid="ref42 ref44">42, 44</xref>
          ], a
statistical measure that accounts for both agreement and the likelihood of chance agreement.
Given that the annotation labels are ordinal, implying a meaningful order among categories,
the standard Cohen’s Kappa is not ideal, as it treats all disagreements equally. Instead, we apply
quadratic weighting, which penalizes larger disagreements more heavily than minor ones. This
ensures that a disagreement between adjacent categories (e.g., 3 vs. 4) is considered less severe
than one between distant categories (e.g., 1 vs. 5). By using weighted Kappa, we obtain a more
accurate measure of inter-rater agreement that properly reflects the structure of our data. The
ifnal labels were determined by averaging the annotators’ ratings.
        </p>
      </sec>
      <sec id="sec-5-2">
        <title>3.2. Engagement Modeling</title>
        <p>
          Engagement labels for model training were generated by averaging the scores provided by the
two annotators. These averaged values were then normalized to a continuous scale ranging from
0 (indicating complete disengagement) to 1 (indicating high engagement). To extract relevant
behavioral features, we used the OpenFace toolkit [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ], which provides a comprehensive set of
facial and gaze-related data. Specifically, features extracted include facial landmarks (2D and
3D), head pose, eye gaze, and Facial Action Units (FAUs).
        </p>
        <p>The resulting dataset comprised 704 features categorized into several groups: gaze data (8),
eye landmarks (168), 3D head pose (6), 2D facial landmarks (136), 3D facial landmarks (204),
head pose model parameters (6), Point Distribution Model (PDM) parameters (34), FAUs (35),
3D landmark Z-coordinates ( 0 to  67), and 39 AU-related parameters. The complete dataset is
publicly available at https://osf.io/4nfwh. All feature values were standardized prior to training
to ensure uniformity across scales.</p>
        <p>
          The predictive model was developed using a deep learning architecture implemented in
TensorFlow and Keras. It consists of fully connected layers utilizing ReLU activation functions
and dropout regularization to mitigate overfitting. The model architecture is summarized in Fig.
3. Since the target variable represents a probability of user engagement, a sigmoid activation
function is used in the output layer to ensure predictions remain within the [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ] range.
        </p>
        <p>To evaluate model performance, the dataset was partitioned into training (80%) and testing
(20%) subsets. To maintain a balanced representation of engagement levels and participant
data, training samples were randomly selected across participants and engagement levels. This
strategy helps to prevent bias due to over-representation of specific cases and promotes
generalizability. To further enhance model robustness and increase data variability, we employed
data augmentation by generating vertically mirrored versions of the video clips. These
augmented clips were assigned the same engagement labels as their original counterparts. Model
optimization was performed using the Adam optimizer with Mean Squared Error (MSE) as the
loss function.</p>
        <p>8
)
E
S
M
(
ro 6
r
r
E
d
e
r
a
q 4
u
S
n
a
e
M
2
⋅10−2</p>
        <sec id="sec-5-2-1">
          <title>Batch Size: 30</title>
        </sec>
        <sec id="sec-5-2-2">
          <title>Batch Size: 40</title>
        </sec>
        <sec id="sec-5-2-3">
          <title>Batch Size: 50</title>
        </sec>
        <sec id="sec-5-2-4">
          <title>Batch Size: 60</title>
        </sec>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>4. Results</title>
      <p>The inter-rater agreement analysis yielded a weighted Cohen’s Kappa of 0.91, indicating a high
level of agreement between the annotators. The predictive model’s performance was evaluated
using Mean Squared Error (MSE), Mean Absolute Error (MAE), and Root Mean Squared Error
(RMSE). Table 1 summarizes the final evaluation outcomes for the model. To better understand
performance variations across engagement levels, we evaluated the model separately on test
subsets corresponding to each engagement level. This allowed us to assess how well the model
generalizes across diferent engagement levels, despite being trained holistically. Fig. 4 displays
the loss curves corresponding to four batch sizes (30, 40, 50, and 60), illustrating the model’s
ifne-tuning process. The batch size of around 40 yields the most stable performance, suggesting
it is well-suited to the dataset’s characteristics and the labeling approach used.
0
5
10
15
25
30
35</p>
      <p>40
20</p>
      <sec id="sec-6-1">
        <title>Epoch</title>
        <p>The model achieved an MSE of 0.0111, indicating low squared error on average across
predictions. The MAE was 0.0696, suggesting an average absolute deviation of approximately
6.96%. Notably, the R² score of 0.8195 indicates that the model accounts for 81.95% of the variance
in engagement values, demonstrating strong predictive performance. These results suggest that
the model efectively captures the underlying engagement patterns. However, performance
was comparatively lower for engagement levels 2, 3, and 7, likely due to the limited amount of
training data available for these categories.</p>
        <p>
          To better understand the relationship between the specific features of engagement and the
model prediction of engagement level, a feature importance analysis was conducted. Table 2
presents the most influential features ranked by their permutation importance values, derived
from the trained model based on MSE as a performance metric. Permutation importance
measures the decrease in a model’s performance when the values of a single feature are randomly
shufled. A higher value indicates a greater contribution of the feature to the model’s predictions
[
          <xref ref-type="bibr" rid="ref47">47</xref>
          ]. These values were calculated using the trained model, with MSE as the evaluation metric.
Notably, a negative permutation importance suggests that scrambling the feature improves
model performance, potentially indicating overfitting, where the model relies on misleading
patterns not generalizable to new data. For each engagement level, the six most important
features are listed. This table provides a more detailed analysis of how the trained model
diferentiates between engagement levels. Fig. 5 presents selected frames from a video in the
dataset, showing a participant interacting with both the task and the robot. Each frame includes
the predicted engagement value. Lower predicted values correspond to lower engagement
levels, as detailed in Table 2. For instance, frame 3 has a predicted engagement score of 0.61
and is associated with facial expressions such as smiling and raised cheeks.
        </p>
      </sec>
      <sec id="sec-6-2">
        <title>Level 7</title>
      </sec>
      <sec id="sec-6-3">
        <title>Level 6</title>
      </sec>
      <sec id="sec-6-4">
        <title>Level 5</title>
      </sec>
      <sec id="sec-6-5">
        <title>Level 4</title>
      </sec>
      <sec id="sec-6-6">
        <title>Level 3</title>
      </sec>
      <sec id="sec-6-7">
        <title>Level 2</title>
      </sec>
      <sec id="sec-6-8">
        <title>Level 1</title>
      </sec>
      <sec id="sec-6-9">
        <title>Total</title>
        <p>
          • Gaze and pose features ranked highest in importance in engagement levels 1, 2, and
3, indicating that gaze direction, orientation, and pose are highly informative for the
prediction task in these levels. Gaze and body orientation are often key signals in assessing
engagement and attention [
          <xref ref-type="bibr" rid="ref5 ref50">50, 51, 5</xref>
          ].
• Facial Action Units show higher importance in levels 5, 6, and 7, suggesting that
expressions play a significant role in the model’s decision-making process in these levels. In the
literature, facial expressions like smiles and raised cheeks are associated with positive
emotions [
          <xref ref-type="bibr" rid="ref23 ref24">23, 24</xref>
          ].
• Facial landmark features, while less dominant individually, still contributed meaningfully.
        </p>
        <p>
          While facial landmarks are not the most influential on their own, they could still play a
key role when considered in conjunction with other features, as discussed in studies on
the integration of multiple engagement cues [
          <xref ref-type="bibr" rid="ref39">39, 52</xref>
          ].
        </p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>5. Conclusion</title>
      <p>This study presents a predictive model for assessing user engagement in a triadic human-robot
interaction setup, consisting of a human, a robot, and a task. The model is built using a novel
dataset that we developed, incorporating multimodal features such as facial landmarks, facial
action units, head pose, and gaze direction. Additionally, we introduce a structured framework
for annotating engagement, addressing a significant gap in existing research on systematic
engagement annotation in the HRI scenario.</p>
      <p>Engagement annotations were carried out using a structured approach, resulting in a weighted
Cohen’s Kappa score of 0.91, reflecting a high level of agreement among the annotators. The
predictive model showed excellent performance, with a Mean Squared Error (MSE) of 0.0111
and an R² score of 0.8195. These results demonstrate the model’s ability to accurately capture
user engagement patterns, suggesting its potential for adapting real-time interactions based on
engagement states.</p>
      <p>Engagement is a complex phenomenon that cannot be fully understood through a single
modality. The model’s diferential weighting of various features supports the idea that
engagement detection benefits from a multimodal approach. For instance, gaze orientation, a major
feature of attention, is closely tied to social presence in robotic companions. In this study, this
correlates with the lower engagement level where the users spend time looking away from the
screen or making eye contact with the robot. Head pose, another key indicator, reflects body
language and attentiveness. This suggests that in adaptive HRI, robots capable of interpreting
users’ head pose could tailor their responses or adjust task dificulty in real-time to re-engage
users, highlighting the potential for robotic systems to leverage machine learning models to
assess and respond to user attentiveness, beyond just task performance. Furthermore, the
model’s reliance on facial action units to detect higher levels of engagement points to a crucial
intersection between facial action units and engagement. For example, robots that adjust their
behavior based on positive emotional cues, like smiles, could enhance user satisfaction and
prolong engagement.</p>
      <p>Although task parameters were not explicitly included as variables in the analysis, all
behavioral data — including facial expressions, gaze, and head pose — were collected during a
structured visuospatial memory task. As such, these cues are inherently tied to participants’
engagement with the task. Given the cognitive demands of the activity, the model is likely to
generalize well to other high-tempo cognitive scenarios, such as video games, cognitive training
programs, or driving simulations.</p>
      <p>
        Despite these contributions, the study acknowledges the importance of context in evaluating
engagement, noting that engagement is context-sensitive and can vary across tasks. For example,
placing the robot beyond the screen, could afect engagement outcomes [
        <xref ref-type="bibr" rid="ref49">49</xref>
        ], resulting in less
attention directed at the robot if it is placed in the peripheral vision. In diferent tasks, whether
non-social or other types of social interactions, users may express their engagement in distinct
ways [
        <xref ref-type="bibr" rid="ref48">48</xref>
        ]. This distinction underscores the task-dependent nature of afective states and
engagement. We acknowledge that the use of facial expression, head pose, and gaze orientation
captures some aspects of user engagement — primarily afect, attention, and interest — but
misses many of the cognitive, behavioral, and experiential dimensions outlined in the broader
definition. Furthermore, there is a need for further research to broaden the generalizability
of these findings by incorporating diverse user populations. Future studies should aim to
incorporate user experiences and physiological data to gain deeper insights into afective states
and enhance the model’s reliability.
      </p>
      <p>
        Moreover, the permutation feature importance analysis is sensitive to feature collinearity
and dependent on a single trained model. While more advanced methods account for feature
interactions and ofer uncertainty bounds on feature importance, they were beyond the
specific objectives of this study, which aimed to provide an initial understanding of the relative
importance of engagement features. For example, Fisher et al. [
        <xref ref-type="bibr" rid="ref46">46</xref>
        ] introduced a framework
that evaluates feature importance across the entire class of well-performing models, known
as Model Class Reliance. This provides bounds on a feature’s importance and accounts for
feature interactions and redundancy. Similarly, SHAP (SHapley Additive exPlanations) ofers
an explanation method grounded in cooperative game theory, attributing contributions to
individual features while accounting for interactions [
        <xref ref-type="bibr" rid="ref45">45</xref>
        ].
      </p>
      <p>In conclusion, this research establishes a foundation framework for understanding and
quantifying user engagement in HRI, presenting significant advancements and practical implications
while also identifying critical areas for further investigation.</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this manuscript, the author(s) utilized ChatGPT-4 solely for the
purpose of grammar and spelling verification. No text was generated by the generative AI, and
the author(s) take(s) full responsibility for the content of this publication. The author(s) used
(https://photo-to-sketch.ai) for converting Figure 1 into sketch-style illustrations, and no further
alterations were made.
[51] Rossi, A., Raiano, M. &amp; Rossi, S. Afective, cognitive and behavioural engagement
detection for human-robot interaction in a bartending scenario. 2021 30th IEEE International
Conference On Robot &amp; Human Interactive Communication (RO-MAN). pp. 208-213 (2021).
[52] Bartlett, M., Stewart, T. &amp; Thill, S. Estimating levels of engagement for social human-robot
interaction using legendre memory units. Companion Of The 2021 ACM/IEEE International
Conference On Human-Robot Interaction. pp. 362-366 (2021).
[53] Ravandi, B. S., Khan, I., Markelius, A., Bergström, M., Gander, P., Erzin, E., &amp; Lowe, R.</p>
      <p>Exploring Task and Social Engagement in Companion Social Robots: A Comparative
Analysis of Feedback Types. Manuscript in review.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>K.</given-names>
            <surname>Doherty</surname>
          </string-name>
          and G. Doherty, “Engagement in
          <string-name>
            <surname>HCI</surname>
          </string-name>
          : Conception, Theory and Measurement,
          <source>” ACM Computing Surveys</source>
          , vol.
          <volume>51</volume>
          , no.
          <issue>5</issue>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>39</lpage>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Jain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Thiagarajan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Clabaugh</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Matarić</surname>
          </string-name>
          , “
          <article-title>Modeling engagement in long-term, in-home socially assistive robot interventions for children with autism spectrum disorders</article-title>
          ,”
          <source>Science Robotics</source>
          , vol.
          <volume>5</volume>
          , no.
          <issue>39</issue>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Andriella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Torras</surname>
          </string-name>
          , and G. Alenyà, “
          <article-title>Cognitive System Framework for brain-training exercise based on human-robot interaction,” Cognitive Computation</article-title>
          , vol.
          <volume>12</volume>
          , no.
          <issue>4</issue>
          , pp.
          <fpage>793</fpage>
          -
          <lpage>810</lpage>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Ben-Youssef</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Clavel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Essid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bilac</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chamoux</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Lim</surname>
          </string-name>
          , “
          <article-title>UE-HRI: A new dataset for the study of user engagement in spontaneous human-robot interactions</article-title>
          ,
          <source>” Proceedings of the 19th ACM International Conference on Multimodal Interaction</source>
          , pp.
          <fpage>464</fpage>
          -
          <lpage>472</lpage>
          ,
          <year>January 2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Ben-Youssef</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Varni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Essid</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Clavel</surname>
          </string-name>
          , '
          <article-title>On-the-Fly Detection of User Engagement Decrease in Spontaneous Human-Robot Interaction Using Recurrent and Deep Neural Networks'</article-title>
          ,
          <source>International Journal of Social Robotics</source>
          , vol.
          <volume>11</volume>
          , no.
          <issue>5</issue>
          , pp.
          <fpage>815</fpage>
          -
          <lpage>828</lpage>
          , Dec.
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>F.</given-names>
            <surname>del Duchetto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Baxter</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Hanheide</surname>
          </string-name>
          , “
          <article-title>Automatic assessment and learning of robot social abilities</article-title>
          ,
          <source>” ACM/IEEE International Conference on Human-Robot Interaction</source>
          , pp.
          <fpage>561</fpage>
          -
          <lpage>563</lpage>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>F.</given-names>
            <surname>Del Duchetto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Baxter</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Hanheide</surname>
          </string-name>
          , “
          <article-title>Are you still with me? continuous engagement assessment from a robot's point of view,” Frontiers in Robotics and AI</article-title>
          , vol.
          <volume>7</volume>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>N.</given-names>
            <surname>Poltorak</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Drimus</surname>
          </string-name>
          , “
          <article-title>Human-robot interaction assessment using dynamic engagement profiles</article-title>
          ,
          <source>” IEEE-RAS International Conference on Humanoid Robots</source>
          , pp.
          <fpage>649</fpage>
          -
          <lpage>654</lpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S.</given-names>
            <surname>Dhamija</surname>
          </string-name>
          and
          <string-name>
            <given-names>T. E.</given-names>
            <surname>Boult</surname>
          </string-name>
          , “
          <source>Automated Action Units Vs. Expert Raters: Face of,” Proceedings - 2018 IEEE Winter Conference on Applications of Computer Vision</source>
          , WACV
          <year>2018</year>
          , vol.
          <source>2018-Janua</source>
          , pp.
          <fpage>259</fpage>
          -
          <lpage>268</lpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S. P.</given-names>
            <surname>Pattar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Coronado</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. R.</given-names>
            <surname>Ardila</surname>
          </string-name>
          , and G. Venture, “
          <article-title>Intention and engagement recognition for personalized human-robot interaction, an integrated and deep learning approach</article-title>
          ,
          <source>” 2019 4th IEEE International Conference on Advanced Robotics and Mechatronics</source>
          ,
          <string-name>
            <surname>ICARM</surname>
          </string-name>
          <year>2019</year>
          , pp.
          <fpage>93</fpage>
          -
          <lpage>98</lpage>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>O</given-names>
            <surname>'Brien</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            &amp;
            <surname>Toms</surname>
          </string-name>
          ,
          <string-name>
            <surname>E.</surname>
          </string-name>
          <article-title>What is user engagement? A conceptual framework for defining user engagement with technology</article-title>
          .
          <source>Journal Of The American Society For Information Science And Technology</source>
          .
          <volume>59</volume>
          ,
          <fpage>938</fpage>
          -
          <lpage>955</lpage>
          (
          <year>2008</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Baltrušaitis</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Robinson</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Morency</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <article-title>OpenFace: An open source facial behavior analysis toolkit</article-title>
          .
          <source>2016 IEEE Winter Conference On Applications Of Computer Vision (WACV)</source>
          . pp.
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>I.</given-names>
            <surname>Goodfellow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Erhan</surname>
          </string-name>
          , PL. Carrier,
          <string-name>
            <given-names>A.</given-names>
            <surname>Courville</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mirza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Hamner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Cukierski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tang</surname>
          </string-name>
          , DH. Lee,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Ramaiah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Athanasakis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Shawe-Taylor</surname>
          </string-name>
          , M. Milakov,
          <string-name>
            <given-names>J.</given-names>
            <surname>Park</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ionescu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Popescu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Grozea</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bergstra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Romaszko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Chuang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bengio</surname>
          </string-name>
          <article-title>”Challenges in Representation Learning: A report on three machine learning contests</article-title>
          ,
          <source>” International Conference on Machine Learning</source>
          (ICML)
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>O.</given-names>
            <surname>Rudovic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Schuller</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R. W.</given-names>
            <surname>Picard</surname>
          </string-name>
          , “
          <article-title>Personalized machine learning for robot perception of afect and engagement in autism therapy</article-title>
          ,
          <source>” Sci Robot</source>
          , vol.
          <volume>3</volume>
          , no.
          <issue>19</issue>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>O.</given-names>
            <surname>Rudovic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Schuller</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R. W.</given-names>
            <surname>Picard</surname>
          </string-name>
          ,
          <article-title>“Multi-modal active learning from human data: A deep reinforcement learning approach</article-title>
          ,
          <source>” ICMI 2019 - Proceedings of the 2019 International Conference on Multimodal Interaction</source>
          , pp.
          <fpage>6</fpage>
          -
          <lpage>15</lpage>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Ravandi</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <article-title>Gamification for Personalized Human-Robot Interaction in Companion Social Robots</article-title>
          .
          <source>2024 12th International Conference On Afective Computing And Intelligent Interaction Workshops And Demos (ACIIW)</source>
          . pp.
          <fpage>106</fpage>
          -
          <lpage>110</lpage>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>K.</given-names>
            <surname>Saleh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Yu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Chen</surname>
          </string-name>
          , “
          <article-title>Improving users engagement detection using end-to-end spatio-temporal convolutional neural networks</article-title>
          ,
          <source>” Companion of the 2021 ACM/IEEE International Conference on Human-Robot Interaction</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>D.</given-names>
            <surname>Ayllon</surname>
          </string-name>
          , T.-S. Chou,
          <string-name>
            <given-names>A.</given-names>
            <surname>King</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Shen</surname>
          </string-name>
          , “
          <article-title>Identification and engagement of passive subjects in multiparty conversations by a humanoid robot,” Companion of the 2021 ACM/</article-title>
          IEEE International Conference on Human-Robot
          <string-name>
            <surname>Interaction</surname>
          </string-name>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>G. K.</given-names>
            <surname>Sidiropoulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. A.</given-names>
            <surname>Papakostas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Lytridis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bazinas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. G.</given-names>
            <surname>Kaburlasos</surname>
          </string-name>
          , E. Kourampa, and E. Karageorgiou, “
          <article-title>Measuring engagement level in child-robot interaction using machine learning based data analysis,” 2020 International Conference on Data Analytics for Business and Industry: Way Towards a Sustainable Economy (ICDABI</article-title>
          ),
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>A.</given-names>
            <surname>Atamna</surname>
          </string-name>
          and
          <string-name>
            <given-names>C.</given-names>
            <surname>Clavel</surname>
          </string-name>
          , “
          <article-title>HRI-RNN: A user-robot dynamics-oriented RNN for engagement decrease detection</article-title>
          ,
          <source>” Interspeech</source>
          <year>2020</year>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>O.</given-names>
            <surname>Rudovic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. W.</given-names>
            <surname>Park</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Busche</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Schuller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Breazeal</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R. W.</given-names>
            <surname>Picard</surname>
          </string-name>
          , “
          <article-title>Personalized estimation of engagement from videos using active learning with deep reinforcement learning</article-title>
          ,
          <source>” 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>O.</given-names>
            <surname>Rudovic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Utsumi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hernandez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. C.</given-names>
            <surname>Ferrer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Schuller</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R. W.</given-names>
            <surname>Picard</surname>
          </string-name>
          , “
          <article-title>Culturenet: A deep learning approach for engagement intensity estimation from face images of children with autism</article-title>
          ,
          <source>” 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>A.</given-names>
            <surname>Mollahosseini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Abdollahi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M. H.</given-names>
            <surname>Mahoor</surname>
          </string-name>
          , “
          <article-title>Studying Efects of Incorporating Automated Afect Perception with Spoken Dialog in Social Robots,”</article-title>
          <source>RO-MAN 2018 - 27th IEEE International Symposium on Robot and Human Interactive Communication</source>
          , pp.
          <fpage>783</fpage>
          -
          <lpage>789</lpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>A.</given-names>
            <surname>Rajavenkatanarayanan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. R.</given-names>
            <surname>Babu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Tsiakas</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Makedon</surname>
          </string-name>
          , “
          <article-title>Monitoring task engagement using facial expressions and body postures</article-title>
          ,
          <source>” Proceedings of the 3rd International Workshop on Interactive and Spatial Computing - IWISC '18</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>A.</given-names>
            <surname>Di Nuovo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Conti</surname>
          </string-name>
          , G. Trubia,
          <string-name>
            <given-names>S.</given-names>
            <surname>Buono</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Di</surname>
          </string-name>
          <string-name>
            <surname>Nuovo</surname>
          </string-name>
          , “
          <article-title>Deep Learning Systems for estimating visual attention in robot-assisted therapy of children with autism and intellectual disability</article-title>
          ,
          <source>” Robotics</source>
          , vol.
          <volume>7</volume>
          , no.
          <issue>2</issue>
          , p.
          <fpage>25</fpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>D.</given-names>
            <surname>Anagnostopoulou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Efthymiou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Papailiou</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Maragos</surname>
          </string-name>
          , “
          <article-title>Engagement Estimation During Child Robot Interaction Using Deep Convolutional Networks Focusing on ASD Children,” no</article-title>
          . June, pp.
          <fpage>3641</fpage>
          -
          <lpage>3647</lpage>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>R.</given-names>
            <surname>Garris</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ahlers</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J. E.</given-names>
            <surname>Driskell</surname>
          </string-name>
          , “Games, motivation, and learning:
          <source>A research and Practice Model,” Simulation &amp;amp; Gaming</source>
          , vol.
          <volume>33</volume>
          , no.
          <issue>4</issue>
          , pp.
          <fpage>441</fpage>
          -
          <lpage>467</lpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>T.</given-names>
            <surname>Alves</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gama</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F. S.</given-names>
            <surname>Melo</surname>
          </string-name>
          , “
          <article-title>Flow adaptation in serious games for health</article-title>
          ,”
          <source>2018 IEEE 6th International Conference on Serious Games and Applications</source>
          for Health,
          <source>SeGAH</source>
          <year>2018</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>He</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Sun</surname>
          </string-name>
          , “
          <article-title>Accelerating very deep convolutional networks for classification and detection</article-title>
          ,
          <source>” IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
          , vol.
          <volume>38</volume>
          , no.
          <issue>10</issue>
          , pp.
          <fpage>1943</fpage>
          -
          <lpage>1955</lpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <surname>Ahmad</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mubin</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Orlando</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <article-title>Children views' on social robot's adaptations in education</article-title>
          .
          <source>Proceedings Of The 28th Australian Conference On Computer-Human Interaction</source>
          . pp.
          <fpage>145</fpage>
          -
          <lpage>149</lpage>
          (
          <year>2016</year>
          ), https://doi.org/10.1145/3010915.3010977.
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <surname>Liles</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Ms</surname>
          </string-name>
          .
          <string-name>
            <surname>An</surname>
          </string-name>
          (Meeting Students' Academic Needs):
          <source>Engaging Students in Math Education. Adaptive Instructional Systems</source>
          . pp.
          <fpage>645</fpage>
          -
          <lpage>661</lpage>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <surname>Csikszentmihalyi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <article-title>Finding flow: The psychology of engagement with everyday life</article-title>
          ..
          <source>(Basic Books</source>
          ,
          <year>1997</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <surname>Brown</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kerwin</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Howard</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <article-title>Applying Behavioral Strategies for Student Engagement Using a Robotic Educational Agent</article-title>
          .
          <source>2013 IEEE International Conference On Systems, Man, And Cybernetics</source>
          . pp.
          <fpage>4360</fpage>
          -
          <lpage>4365</lpage>
          (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <surname>Boccanfuso</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leite</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Torres</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Salomons</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Foster</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barney</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ahn</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Scassellati</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Shic</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <article-title>A thermal emotion classifier for improved human-robot interaction</article-title>
          .
          <source>2016 25th IEEE International Symposium On Robot And Human Interactive Communication (RO-MAN)</source>
          . pp.
          <fpage>718</fpage>
          -
          <lpage>723</lpage>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <surname>Javed</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Park</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <article-title>Interactions With an Empathetic Agent: Regulating Emotions and Improving Engagement in Autism</article-title>
          .
          <source>IEEE Robot Autom Mag</source>
          .
          <volume>26</volume>
          ,
          <fpage>40</fpage>
          -
          <lpage>48</lpage>
          (
          <year>2019</year>
          ,4).
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [36]
          <string-name>
            <surname>Amanatiadis</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kaburlasos</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dardani</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Chatzichristofis</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <article-title>Interactive social robots in special education</article-title>
          .
          <source>2017 IEEE 7th International Conference On Consumer Electronics - Berlin (ICCE-Berlin)</source>
          . pp.
          <fpage>126</fpage>
          -
          <lpage>129</lpage>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [37]
          <string-name>
            <surname>Abdelrahman</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Strazdas</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khalifa</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hintz</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hempel</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Al-Hamadi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <article-title>Multimodal Engagement Prediction in Multiperson Human-Robot Interaction</article-title>
          .
          <source>IEEE Access</source>
          .
          <volume>10</volume>
          pp.
          <fpage>61980</fpage>
          -
          <lpage>61991</lpage>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          [38]
          <string-name>
            <surname>Duque-Domingo</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gómez-García-Bermejo</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Zalama</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <article-title>Gaze control of a robotic head for realistic interaction with humans</article-title>
          .
          <source>Frontiers In Neurorobotics</source>
          .
          <volume>14</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          [39]
          <string-name>
            <surname>Ravandi</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khan</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gander</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          &amp; And,
          <string-name>
            <surname>Lowe</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <article-title>Deep Learning Approaches for User Engagement Detection in Human-Robot Interaction: A Scoping Review</article-title>
          .
          <source>International Journal Of Human-Computer Interaction</source>
          . pp.
          <fpage>1</fpage>
          -
          <lpage>19</lpage>
          (
          <year>2025</year>
          ), https://- doi.org/10.1080/10447318.
          <year>2025</year>
          .
          <volume>2470277</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          [40]
          <string-name>
            <surname>Arshad</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hashim</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mohd</surname>
            <given-names>Arifin</given-names>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Mohd</given-names>
            <surname>Aszemi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Low</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            &amp;
            <surname>Norman</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          <article-title>Robots as Assistive Technology Tools to Enhance Cognitive Abilities and Foster Valuable Learning Experiences among Young Children With Autism Spectrum Disorder</article-title>
          .
          <source>IEEE Access</source>
          . 8 pp.
          <fpage>116279</fpage>
          -
          <lpage>116291</lpage>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          [41]
          <string-name>
            <given-names>Manh</given-names>
            <surname>Do</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Sheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            ,
            <surname>Harrington</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            &amp;
            <surname>Bishop</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          <article-title>Clinical Screening Interview Using a Social Robot for Geriatric Care</article-title>
          .
          <source>IEEE Transactions On Automation Science And Engineering</source>
          .
          <volume>18</volume>
          ,
          <fpage>1229</fpage>
          -
          <lpage>1242</lpage>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          [42]
          <string-name>
            <surname>McHugh</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <article-title>Interrater reliability: the kappa statistic</article-title>
          .
          <source>Biochem Med</source>
          (Zagreb).
          <volume>22</volume>
          ,
          <fpage>276</fpage>
          -
          <lpage>282</lpage>
          (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref43">
        <mixed-citation>
          [43]
          <string-name>
            <surname>Oertel</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Castellano</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chetouani</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nasir</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Obaid</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pelachaud</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Peters</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <article-title>Engagement in human-agent interaction: An overview</article-title>
          .
          <source>Frontiers In Robotics And AI</source>
          .
        </mixed-citation>
      </ref>
      <ref id="ref44">
        <mixed-citation>
          [44]
          <string-name>
            <surname>Cohen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <article-title>Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit</article-title>
          .
          <source>Psychological Bulletin</source>
          .
          <volume>70</volume>
          ,
          <issue>213</issue>
          (
          <year>1968</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref45">
        <mixed-citation>
          [45]
          <string-name>
            <surname>Lundberg</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <article-title>A unified approach to interpreting model predictions</article-title>
          .
          <source>Advances In Neural Information Processing Systems</source>
          .
          <volume>30</volume>
          (
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref46">
        <mixed-citation>
          [46]
          <string-name>
            <surname>Fisher</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rudin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Dominici</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <article-title>All models are wrong, but many are useful: Learning a variable's importance by studying an entire class of prediction models simultaneously</article-title>
          .
          <source>Journal Of Machine Learning Research</source>
          .
          <volume>20</volume>
          ,
          <fpage>1</fpage>
          -
          <lpage>81</lpage>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref47">
        <mixed-citation>
          [47]
          <string-name>
            <surname>Breiman</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <article-title>Random forests</article-title>
          .
          <source>Machine Learning</source>
          .
          <volume>45</volume>
          pp.
          <fpage>5</fpage>
          -
          <lpage>32</lpage>
          (
          <year>2001</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref48">
        <mixed-citation>
          [48]
          <string-name>
            <surname>Borges</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lindblom</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Clarke</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gander</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Lowe</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <article-title>Classifying confusion: autodetection of communicative misunderstandings using facial action units</article-title>
          .
          <source>2019 8th International Conference On Afective Computing And Intelligent Interaction Workshops And Demos (ACIIW)</source>
          . pp.
          <fpage>401</fpage>
          -
          <lpage>406</lpage>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref49">
        <mixed-citation>
          [49]
          <string-name>
            <surname>Markelius</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sjöberg</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bergström</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ravandi</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vivas</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khan</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Lowe</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <article-title>Diferential Outcomes Training of Visuospatial Memory: A Gamified Approach Using a Socially Assistive Robot</article-title>
          .
          <source>International Journal Of Social Robotics</source>
          .
          <volume>16</volume>
          ,
          <fpage>363</fpage>
          -
          <lpage>384</lpage>
          (
          <year>2024</year>
          ,2), https://doi.org/10.1007/s12369-023-01083-0.
        </mixed-citation>
      </ref>
      <ref id="ref50">
        <mixed-citation>
          [50]
          <string-name>
            <surname>Hadfield</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chalvatzaki</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koutras</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khamassi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tzafestas</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Maragos</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <article-title>A Deep Learning Approach for Multi-View Engagement Estimation of Children in a Child-Robot Joint Attention Task</article-title>
          .
          <source>2019 IEEE/RSJ International Conference On Intelligent Robots And Systems (IROS)</source>
          . pp.
          <fpage>1251</fpage>
          -
          <lpage>1256</lpage>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>