<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>SMARTERCARE Workshop, November</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>HeAL9000: an Intelligent Rehabilitation Robot</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>LorenzoCristofo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>i Claudiu D.Hromei</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>FrancescoScotto di Luzi</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>ChristianTamantin</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>FrancescaCordell</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>DaniloCroce</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>LoredanaZoll o</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>RobertoBasil</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Enterprise Engineering University of Rome</institution>
          , “
          <addr-line>Tor Vergata”, Rome</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Research Unit of Advanced Robotics and Human-Centred Technologies, Universitá Campus Bio-Medico di Roma</institution>
          ,
          <addr-line>Rome</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <volume>29</volume>
      <issue>2021</issue>
      <fpage>29</fpage>
      <lpage>41</lpage>
      <abstract>
        <p>AI applications to health related processes include the adoption of robotic platforms for rehabilitation, aiming at the delivery of highly intensive, repeatable and accurate motion therapies and able to constantly monitor the patient and provide the suitable assistance levels. However, a comprehensive approach to robot-aided rehabilitation requires also a social level of interaction with the patient that implies cognitive modeling and linguistic communication. It is worth noticing that robotic platforms providing both physical and cognitive support to patients have not been proposed so far. In the HeAL9000 project an intelligent robot for rehabilitation of patients afected by musculoskeletal disorders is proposed with cognitive and linguistic abilities. It relies strongly on machine learning technologies whose aim is to support cost efective engineering of the platform as well as evolving capabilities across time. In the paper early experimental evidence is acquired through quantitative evaluation.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;AI for Rehabilitation</kwd>
        <kwd>Rehabilitation Robotics</kwd>
        <kwd>adaptive Human Robot Interaction</kwd>
        <kwd>Natural Language Understanding</kwd>
        <kwd>Dialogue</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>The adoption of robotic devices in rehabilitation is widely increased in recent years since
robots can deliver highly intensive, repeatable and accurate motion therapy for neurological
and musculoskeletal disorde1r]s. T[he goal of rehabilitation is the functional recovery of the
body area of interest, the restoration of the functional range of motion and the recovery of
muscle strengt2h][. Robot-aided motor therapy has further advantages, such as the possibility
of objectifying patient performance, modulating appropriately the level of assistance and
providing feedback during treatme3n].tS[everal studies have been conducted to evaluate
the efectiveness of robot-aided rehabilitation. The first clinical study was carried out in 1997
using the commercial robot MIT-MANUS on post-stroke pati4e]n.tIns [more recent studies,
robotic assistance has been dynamically changed based on the subject’s needs, evaluated by
bio-mechanical and psycho-physiological monitoring syst5e].mTsh[ese platforms establish
a purely physical interaction with the patient, but they generally do not include a cognitive
interaction, although it can be a fundamental tool for the user motivation and e6]n.gagement [
However, robotic platforms capable of providing both physical and cognitive support to the
patient with neurological or musculoskeletal disorders did not emerge from the analysis of the
state-of-the-art. A platform capable of providing an adequate level of assistance, tailored to
the individual patient’s needs, favoring her participation may deliver a more efective training
session.</p>
      <p>The project HeAL9000 (“Healthcare Agents and Learning robots”) is funded by Regione Lazio
and aims at designing, developing and validating in the operative scenario a smart robotic
platform to deliver rehabilitation to a patient afected by musculoskeletal disorders. The platform
should be able to promote and facilitate the patient’s motor recovery through a human-robot
interaction exploiting communication channels typically used in therapist-patient interaction, i.e.
verbal, physical, cognitive. HeAL9000 is a Service Oriented architecture that integrates a Robotic
Platform and cognitive services whose aim is to control the verbal and non-verbal interaction
between the robot and the patient. The cognitive components implement a Dialogue system
that interprets the utterances of the patient, the non-verbal stimuli (e.g., the physiological input
from dedicated devices, such as the heart rate or patient temperature) and plans the interaction.
We modeled the interaction in the overall rehab session, by i) demonstrating the exercises to be
performed, ii) observing and evaluating the patient during its practice and eventually correcting
him with verbal signals, iii) actively supporting him with the robotic arm as a therapist would
do. The entire interaction is also expected to account for emotional information, implicitly
shown by the patient through Computer Vision (here devoted to Face Emotion Reco7]g)nition [
and language processing according to the automatic analysis of the spoken utterance to extract
Emotional stimuli both from the patient tones and the sentence contents. This allows to improve
a more natural interaction with the robot and improve engagement. This paper is thus focused
on presenting the overall architecture that is under development, with particular emphasis on the
specific modules devoted to the Dialogue Management and the Natural Language Understanding.
A dedicated experimental evaluation of these specific modules shows that they can be adopted
for a more robust and efective HRI.</p>
      <p>In the rest of the paper, Sect2iopnrovides an overview to the HeAL9000 architecture, while
Section3 reports the experimental evaluation of some of the AI components and4Section
draws the conclusions.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Integrating robot-aided rehabilitation and language learning</title>
      <p>The role of the therapist is paramount in motor rehabilitation. The healthcare operator has to
establish a physical and cognitive relationship with the patient and his intervention cannot
ignore the patient’s clinical and emotional state. The HeAL9000 project aims at replicating
the interaction between therapist and patient established during a conventional rehabilitation
session, to improve the efectiveness of the rehabilitation treatment. Such a platform could
represent a disruptive source of technological innovation in modern robot-aided rehabilitation
and it can be an enormous step forward in terms of human-robot interaction, robot autonomy,
safety for the patient, reliability of the robot and efectiveness of the rehabilitation treatment.
To do this, HeAL9000 robotic platform will (i) consist of a service robot capable of replicating
the behavior of the human therapist thanks to the use of Machine Learning and Learning by
Demonstration techniques; (ii) have a highly adaptive behavior concerning the
characterisSpeech Audio
to
Text</p>
      <p>Text
Status Notification
Sensors Parameters</p>
      <p>User Utterance
Data Initialization about
Patients and Exercises</p>
      <p>Information push</p>
      <p>Commands
Cognitive component
NLU</p>
      <p>DM</p>
      <p>NLG
Dialogue Manager
Black
Board</p>
      <p>KB</p>
      <p>Patients – Exercises</p>
      <p>Repository
HeAL9000 - Controller PDaattiaenIntsitaianldizEatxieornciasbeosut</p>
      <p>Status Notification</p>
      <p>Sensors Parameter
Shared Storage Admin Console Commands</p>
      <p>Audio
Audio
Physiological and
biomechanical sensors</p>
      <p>Skeleton tracking
Patient</p>
      <p>Service
Robot</p>
      <p>Robotic component</p>
      <p>RGB-D camera
Wearable monitoring
tics of the patient and the context thanks to the multi-modal monitoring of the patient; (iii)
establish physical and cognitive interaction with the patient, similar to the one observed in the
combination therapist-patient. It has the twofold purpose of motivating the patient to actively
participate in the treatment and to strongly personalize the rehabilitation session according to
the physical and cognitive state of the patient.</p>
      <p>Figure1 summarizes the overall architecture. The patient physically and verbally interacts
with the Robotic component that is also devoted to measuring her physiological and
biomechanical information, while visually tracking the patient’s movements and emotional status.
This body of information is provided to the so-called cognitive component, which processes
such input, plans the interaction (through dedicated sub-modules presented hereafter) and
provides instruction to the robot, both in terms of actions to be performed or utterances to be
pronounced. Both components and their communication are orchestrated by the HeAL9000
Controller, which also stores the shared information. Moreover, the controller provides the
Monitoring and Administration console (also for security purposes) and interfaces with external
repositories (with clinical information about patients and exercises) or external modules such
as Speech to Text modules. The Robotic Components are implemented on the adopted TIAGo
robot1 based on the ROS (Robot Operating System) middleware, while the other components
are implemented as Service Oriented Architectures hosted in a dedicated Cloud.</p>
      <sec id="sec-2-1">
        <title>2.1. Machine Learning for Patient-therapist interaction</title>
        <p>The natural interaction between the patient and a therapeutic robot is crucially dependent on
the ability to integrate learning at the physical level (as adaptive control mechanisms and
datadriven machine vision are involved) and at a cognitive level, related to the ability to recognize
people, profile them and support linguistic communication with them.</p>
        <p>Cognitive aspects. Starting from the analysis of the conventional rehabilitation sessions,
it is possible to distinguish the roles played by the patient and the therapist. They assume
dynamic behaviors based on circumstances and stimuli that the two exchange reciprocally. At
the beginning of the rehabilitation session, the clinician carries out the demonstration: the
therapist explains to the patient the task to be performed, not only verbally but also with the
help of his/her own body. In this context, the patient is a listener: he/she does not perform any
movement. However, the patient may asks for clarifications about the activity to be performed.
At the end of the demonstration, the therapist starts the second phase of the treatment: the
observation. At this time, the patient plays the role of the main actor as he/she is asked to
perform the proposed exercise independently. In turn, the therapist monitors the subject
and encourages and/or warns him/her to carry out the assigned motor task in the best way.
Whenever the therapist decides to intervene to correct any patient error and/or the patient
complains of pain or fatigue, the role of the therapist turns into the helper one. In this context,
the real physical interaction between the two actors begins and takes place. As soon as the
exercise is completed, the cycle can start over and iteratively continue until the rehabilitation
session is completed8[]. In order to develop an efective robot-aided system for rehabilitation,
it is necessary to implement such roles onto a robotic platform, able to handle both physical
and cognitive interactions, as shown in Fi2g. ure</p>
        <p>Physical interaction. The robotic platform will be able to play one role among demonstrator,
observer and helper, as reported in Fig2u.rIne the first scenario, the robot will demonstrate the
motor task to be performed by the patient. When the patient will try to execute the task, the
robot will constantly evaluate the motor performance of the patient exploiting RGB-D cameras
and the skeleton tracking algorithm. In this way, the robot will be aware of patient errors, pain
and/or risk conditions to assist him/her when needed. In the helper role, the robot will guide
the limb of the patient to correctly execute the assigned task. To do this, it is essential to model
the physical interactions of the traditional rehabilitation treatment to tune the optimal behavior
of the robotic platform. Learning by demonstration approaches based on Dynamic Movement
Primitives 9[] will allow encoding the therapists-patient physical interaction to re-target the
recorded motions onto the robotic platform.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. The role of Dialogue</title>
        <p>The cognitive dimension of the therapist-patient interaction is supported by managing natural
dialogue able to integrate control aspects (e.g. the visual detection of critical phenomena for the
patient, such as wrong positions or expressions of pain) into natural managed by a dedicated
set of modules devoted to the acquisition of input from the environment (from verbal input to
physical stimuli acquired through the dedicated sensors), to the tracking of the whole dialogue
and the planning of individual reactions. The workflow is depicted in3F.igure</p>
        <p>
          When a verbal input is provided by the patient, the neural architecture dis1c0u]ssed in [
is applied forSpeech to Text transcription. Let us consider a patient during the exercise who
says “My arm hurts” to express some dificulties due to the requested movements. Content is
processed by theNatural Language Understanding module that implements the inductive method
described in1[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]: here the semantic interpretation in terms of Frames Sema12n]ttichsa[t models
input sentences into meaning representation graphs is also coupled with the recognition of the
user intent. The semantic graph is then provided tDoiatlohgeue Management Module used to
recognize user states oDnialogue State Tracking, to plan the robot reactions to the input, to
update current states, accordingly, and finally to compile the requested linguistics output. In
the workflow, the resulting semantic framBeoidsy_movement that ask the patient to move its
arm (theBody_part as argument of the input frame) for a gDivuernation, i.e. a while. The
output frame is compiled by tNhaetural Language Generation into a sentence likLeift“it up for
a while” used to feed the robot text-to-speech module. The cognitive architecture in HeAL9000
integrates inductive modules such as the language understanding one with knowledge-based
components, strongly dependent on domain-specific pragmatic (e.g dialogue state tracking)
resources as well as medical knowledge bases.
        </p>
        <p>The tasks of Dialogue Manager (DM) and Dialogue State Tracking (DST) in recent years
are often addressed with the use of end-to-end methods for example using transformers to
encode the input, like user sentences, and generate the output, the response of the system. This
emerges also from the diferent dialogue state tracking chal1le3n]gsuecsh[ as the last of these,
the DST92 in which most of the systems use the technologies mentioned above. In HeAL9000
the Dialogue Manager has the ambition to controlling the interaction between a robot and a
patient in a critical scenario. Based on this consideration we decide to adopt a system that was
i) as controllable as possible in terms of dialogues produced and actions performed by the robot;
ii) flexible and adaptable to new scenarios. The DM was modeled as a set of State Machines,
each of which performs a specific activity within the flow of the physical therapy session, i.e.
in the initial phase a state to welcome the patient or during the execution of the exercise a
state to stop the patient when he/she makes mistakes. Each state represents a specific action
(verbal or not) to be performed by the robot, each edge can be used for state change if and
only if the conditions of the edge are satisfied. Examples of conditions can be a combination
of particular user utterances, a set of information in the knowledge base, or diferent events
such as sensor data or facial expressions of the user. Thus, the response of a patient who is
performing a physical therapy exercise, to feedback requested by the system, can be processed
by considering a wide set of signals, not only the vocal one, as in the following example:</p>
        <sec id="sec-2-2-1">
          <title>HeAL9000: are you okay?</title>
        </sec>
        <sec id="sec-2-2-2">
          <title>Patienty:es, everything is fine!</title>
          <p>sensor: Patient HeartRate High
sensor: Patient BloodPressure High
sensor: Patient Sad</p>
        </sec>
        <sec id="sec-2-2-3">
          <title>HeAL9000: please take a break and breathe deeply.</title>
          <p>Information from the sensors is added to the linguistic features for the Natural Language
Understanding module, similarly1t1o].[The model is thus trained on data containing a mixture
of linguistic and sensory features in order to be able to distinguish events as in the example
above.</p>
          <p>The Dialogue Management module is modeled as a set of non-deterministic finite state
machines. The dialogue can be represented as a set of quin( t,,u pl0e,  ,  ) where:
•  is the input user utterances, sensors or other signals and knowledge base information.
•  is the set of states of dialogue.
•  0 is the initial states of each dialogue phase.
•  is the function that makes each state and input to correspond a sub-set of possible states:
 ∶  ×  →  , where ⊂  .</p>
          <p>•  is the set of final states.</p>
          <p>Figure4 shows a simplified version for the demonstration phase in which the robot shows
the video of the exercisSeta(rt_video_exercise), confirms that the user has understood
the exercise to be performeCdo(nfirm_exercise) and asks the patient to start the activity
(Req_activity_start). If the patient does not start the exercise, a second explanation of the
exercise is providedE(xplain_exercise). Where the user fails to start the activity, the system
provides a special end state and then calls a human opeErnadt_ocral(l_operator). In order to
avoid not planned behaviors, there is always an errorEnstda_wtef(ailure). When the system
is not able to understand the intentions of the user, it enters in a speCcliaarlifsticaatteio(n)
with which the intentions of the user are clarified.</p>
          <p>During the entire dialogue, the system acquires information and exploits the data in its
possession to choose and generate the answers to be provided to the user. All data is stored in a
knowledge base structured as an RDFS/OWL ontology. Ontologies are often used in the context
of dialogue systems, such as i1n4][. Within the knowledge base, we defined several concepts so
that the system would be able to use patient information such as first name, last name, birth date.
The information about the exercises such as the correct movements to perform, the number
of repetitions, the series, the parts of the body involved in the exercise and other information
useful to the system to help the patient to perform the exercise through the use of dialogue.
This technological choice has two main advantages, i) the use of ontologies allows a formal
and explicit description of the concepts of the domain of interest, and this allows the system
to query structured data and ii) the Open World Assumption of OWL allows the system to
progressively enrich its knowledge with information from the web.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Evaluating learning robots in therapeutic scenarios</title>
      <p>
        Natural Language Understanding as an emerging ability. Machine learning for natural
language has been traditionally applied to induce cognitively plausible interpretation models,
usually based on theories about the nature and semantics of human communication. Frame
semantics 1[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] has played often the role of reference theory for semantic interpretation and
representation. Frame1s5[] are cognitive devices able to represent and encode properties of
eventualities, i.e. situations, world states and subject’s personal state, and support interpretation
and reasoning about a context and a domain. They play at the same time the role of formalism
for the representation of the operational context of a robot and of a knowledge repository to
express world knowledge.
      </p>
      <p>
        Frames are thus useful during dialogue as they can express (and constraint) the intentions and
content related to a patient utterance, as a guide of the interpretation process, as well as a storage
device for maintaining the dialogue state. Frames have been often used for automatic Information
Extraction through Machine Learni1n6g]) (w[hereas interpretation is seen as a structured
sentence classification process. In line with this perspective, we propose an interpretation
framework consisting of a cascade of classification steps aiming at recognizing purposes and
content semantics in support of meaningful dialogue. Firstly, a sentence classification step
(namely Intent Classification) is applied to input (i.e. the patient’s) utterances and then sequence
labeling is applied for automating Semantic Role La b eli n)gas( in [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. In Intent Classification
each sentence is associated with the intent, consisting of a laberleqsuucehstaosrinform):
the objective is that we need to detect the patient’s intent of a sentence during the dialogue to
understand his goal or state, i.e. ifahskei’nsg for something oprroviding information to the
system.
      </p>
      <p>DuringSemantic Role Labeling words are associated with labels expressing the role they play
in the semantic frame vehiculated by the sentence. Roles establish the diferences between
predicates, the so-called Lexical Units denoting the frame, and arguments, that are ,the roles
called Frame Elements o,factivated by the sentence. Multiple frames in a sentence are the
norm and establish ways of detecting and storing contextual knowledge during the dialogue. For
its complexity, Semantic Role Labeling is further divided into three sFurbatmaesPkrse:diction
predicts which Fram esis expressed by an input sentence and labels the words (lexical units)
responsible for evokin g; Boundary Detection detects the starting and ending positions of
individual arguments of each recognized fr;aAmregument Classification assigns the semantic
roles, i.e. the Frame elements associated with the predicte d ,Ftroamtehe arguments detected
in the previous step. In the example of Fig3u,rtehe sentencem“y arm hurts” evokes the frame
Experience_bodily_harm through the lexical uhnuirtt. Then the notion oBfody_part is
expressed by the fragmenmty“ arm” as the injured part of tEhxeperiencer, here implicitly
referred to the speaker, i.e. the patient.</p>
      <p>
        Semantic Role Labeling acts on the entire sentence and is modeled as a Markovian
formulation of a structured S VM  (ℎ as in [
        <xref ref-type="bibr" rid="ref11 ref18">18, 11</xref>
        ]). The learning algorithm combines a local
discriminative model, which estimates the individual observation probabilities of a sequence,
with a global generative approach to retrieve the most likely tag sequence that better explains
the semantics of the whole sequence. The labeling obtained  by tℎhe onto the example
sentence m“y arm hurts” is as follows:
[Speaker]Experiencer [my arm]Body_part [hurts] ∶ Experience_bodily_harm
where the pseudo tokenSp[eaker] is used to denote the implicit argumEexnpteriencer not
related to any text portion.
      </p>
      <p>The input of the models is composed, besides the linguistic features in lin11e]w,ailtsho[of
other features such as the intent of the patient’s sentence and the requested information (i.e. the
frames and arguments that HeAL9000 expects). In addition, information from the sensors the
patient wears, the tone and intensity of the voice volume and the emotion recognized through
the Face Emotion Recognition Model are added. This composes a more complete picture of
what the patient communicates to the robot, both verbally and non-verbally. Finally, each
model adds the result of the previous models as a feature to the input. During the dialogue, the
system needs to store and use some information about the interlocutor. When HeAL9000 is a
Demonstrator (i.e. in the Demonstrator stage, Fig3u) rite can be efective to use the patient’s
real name and always be aware of the body part involved during the rehabilitation session.
During theObserver part of the session, the robotic platform should consider the age of the
patient to better evaluate the movements (for example, an older patient may not be able to
fully execute some exercises). Finally, inHtehlpeer stage HeAL9000 needs to consider the body
part involved, whether the patient is in pain and, eventually, the intensity to better help him
complete the session.</p>
      <p>
        The Frames are thus adopted to model the knowledge about the stages and knowledge
incoming through sentences. We studied the involved frames over the current repository
created by the Framen3eptroject 1[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. In particular, we focused on medical and informational
aspects according to the existing frames, an excerpt of which is provided below:
• Being_named: Concerns EntitiePsa(tient) conventionally being referred to by particular
names (Name andSurname).
• Medical_conditions: Medical conditions or diseases thPaattiaent sufers from.
Contains the part or area of the bBooddyy(_part) afected by the condition, thCeause of
the condition, a promineSnytmptom and others.
• Activity_start: An Agent (renamedPatient for our case) initiates the beginning of an
ongoingActivity in which he will be continuously involved. Used to model the exercise
activity during the rehab session.
• Activity_resume: An Agent P(atient) resumes participation in aAnctivity.
• Activity_finish: An Agent P(atient) finishes an Activity, which can no longer logically
continue.
• Experience_bodily_harm: An Experiencer is involved in a bodily injury tBooady_part,
even though in some cases, nBoody_part need be indicated.
• Medical_interaction_scenario: A Patient interacts with one or moMreedics, usually,
thePatient has anAffliction.
• Level_of_force_exertion: An Exerter, Action orForce is capable of exerting or
does exert a physical force at a level specified by the target. The Frame could be used in
theHelper phase to describe the force used by the robotic arm.
• Inhibit_movement: An Agent (the physical robot in our case) restricts the movement of
a Theme (may be the patient’s arm) despite tThheeme’s desire, plan, or tendency towards
motion; theAgent may also use anInstrument (robotic arm).
      </p>
      <p>Some simulated interactions between a patient and a robotic therapist were registered in a
Wizard-of-Oz (WoZ) method and manually labeled to train the machine learning algorithm,
whose evaluation is reported hereafter.</p>
      <p>Evaluating Semantic Role Labeling. The dataset used to train the SVM models consists
of about 2,000 sentences representing interactions between a patient and a therapist, equally
3https://framenet.icsi.berkeley.edu/fndrupal/frameIndex
distributed among the various stages of the dialogue and split into train and test sets with
an 80/20 ratio. All steps are modeled as classification tasks. Intent classification corresponds
to a multi-class classification where each sentence has to be assigned to one possible class
(from 8 total classes) reflecting the user’s intent. Frame Prediction corresponds to a
multilabel classification task, where each sentence has to be assigned to zero, one or more classes
reflecting the evoked linguistic predic at(ehsere10 possible frames are considered). Boundary
detection is modeled as a sequence labeling task where arguments are annotated according to
the BIO notatio4.nFinally Argument classification is a multi-class classification task where
each informative chunk has to be associated one o28f ptohsesible classes. The system was
evaluated according to diferent metrics as the two tasks (i.e., Intent Classification and SRL) have
diferent objectives and needs. Accuracy simply calculates the ratio between correct prediction
and total predictions. Sentence Level Accuracy is similar to Accuracy, but for a sentence to
be considered correct, it is necessary for all its word labels to be correctly predicted to make
a perfect match. This is the case with the subtasks of Semantic Role Labeling. We have also
reported Precision and Recall metrics to evaluate the performance of the models at the level of
the entirety of the entities (Frames, Boundaries or Arguments) to be predicted. As far as the
Frames Prediction model is concerned, it is necessary that all the words belonging to the Frame
are correctly labeled in order for it to be considered correct.</p>
      <p>In terms of Accuracy, 96% of the time the system correctly predicts the Intent of a user
sentence. Tabl1e then shows the results of the Semantic Role Labeling (SRL) tasks that more
straightforward. Indeed, in the SRL pipeline, each model assumes that the predictions in the
previous step are correct. As a consequence, Argument Classification is almost perfect, as it
takes advantage of such a gold standard input, where informative chunks (i.e. arguments) are
already perfectly matched.</p>
      <p>Evaluating Dialogue. We demonstrate by simulation that the Dialogue Manager is robust
to adversarial interactions with the system and that it tries to complete the conversation in
a successful end state with as few turns as possible. We thus prepare an experimental setup
consisting of a dialogue made up of 4 phases, in three of which the robot plays the 3 roles shown
in Figure2. The remaining phase (calleIndformation Gathering) is used when the interaction
starts to welcome the patient and collect her personal informatDioenm. oInnsttrhateion phase,
HeAL9000 shows the exercise to be performedO.Ibnservation phase, the patient is observed
performing the exercise and HeAL9000 responds to stimuli coming from sensors and user
4As an example: m[y]_ [left ]B [arm]O [hurts]_, where_ denotes a non informative chuBniks, the beginning of a
chunk,I is used for the elements in the middleOainsdfor the last element of the informative chunk.
verbal input. IHnelping phase, HeAL9000 physically helps the patient to execute the exercise.
For each output of the dialog manager, during all phases, we created 3 categories of possible
responses. A category of consistent answers to simulate a user collaborating with the system.
The second category reflects answers that only partially help the system to continue the dialogue,
by introducing not completely consistent answers, or requests for further explanations (thus
increasing the length of dialogue). The last category reflects not consistent responses to simulate
an adversarial user.</p>
      <p>Simulated data are made of conversations where, in each turn, we select with probability
an answer uniformly at random among the consistent answers and with pr1o−ba bisleilteyct
an answer among the other categories. We simulated 100 dia∀lo∈g{u0e.0s, 0.1, 0.2, … , 1.0} for
a total number of 1100 dialogues. Dialogue ending states fall into three categories: dialogues
that correctly terminate in the final state of the Ssyusctceemss( in Figure5), with an average
length o5f1 dialogue turns; dialogues that correctly terminate in final states but anticipating
the end of the therapy session, e.g., the patient feelinEgapraliyn T(ermination in Figure5),
with an average length44ofdialogue turns; finally, dialogues that terminate in an error state
not handled by the system or conversations that are too long (m1o0r0edtiahlaongue turns)
and therefore terminated eaRrelsye(t in Figure5). Figure5 shows the percentage of the three
categories of dialog terminatSiuocnc(ess, Early Termination, Reset) as the probability value
 increases. Tests showed the system to be tolerant of misinterpretation of user sentences, even
with low probabilityvalues all dialogues, indicated with a reset termination, were terminated
for reaching the maximum allowed length. The system never needed to make use of special
termination states for unexpected errors suchRaessetthestate of the dialogue.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusions</title>
      <p>In this paper, a cognitively inspired robotic architecture for orthopedic rehabilitation is
described. The system integrates motion control capabilities with dialogue and natural language
understanding in order to harmonize and personalize the relationship with the patients. The
approach discussed in the paper is strongly focused on the adaptive abilities supported by Machine
Learning algorithms. The results obtained in the acquisition of language processing abilities and
in the dialogue control are more than encouraging, and pave the way to a robotic system able to
support operational adoption of this technology, data acquisition and incremental improvement
over time. This is a core property in the enabling of rapid and beneficial penetration of this
technology in daily practices.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>The research for this paper was partly funded in project ”Healthcare Agents and Learning robots
– HeAL9000” by Regione Lazio (prot. A0320-2019-28108).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Babaiasl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. H.</given-names>
            <surname>Mahdioun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Jaryani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Yazdani</surname>
          </string-name>
          ,
          <article-title>A review of technological and clinical aspects of robot-aided rehabilitation of upper-extremity after stroke</article-title>
          ,
          <source>Disability and Rehabilitation: Assistive Technology</source>
          <volume>11</volume>
          (
          <year>2016</year>
          )
          <fpage>263</fpage>
          -
          <lpage>280</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>P.</given-names>
            <surname>Maciejasz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Eschweiler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Gerlach-Hahn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Jansen-Troy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Leonhardt</surname>
          </string-name>
          ,
          <article-title>A survey on robotic devices for upper limb rehabilitation</article-title>
          ,
          <source>Journal of neuroengineering and rehabilitation 11</source>
          (
          <year>2014</year>
          )
          <fpage>1</fpage>
          -
          <lpage>29</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>F.</given-names>
            <surname>Scotto di Luzio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Lauretti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Cordella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Draicchio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zollo</surname>
          </string-name>
          ,
          <article-title>Visual vs vibrotactile feedback for posture assessment during upper-limb robot-aided rehabilitation</article-title>
          ,
          <source>Applied ergonomics 82</source>
          (
          <year>2020</year>
          )
          <fpage>102950</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M. L.</given-names>
            <surname>Aisen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. I.</given-names>
            <surname>Krebs</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Hogan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>McDowell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. T.</given-names>
            <surname>Volpe</surname>
          </string-name>
          ,
          <article-title>The efect of robot-assisted therapy and rehabilitative training on motor recovery following stroke</article-title>
          ,
          <source>Archives of neurology 54</source>
          (
          <year>1997</year>
          )
          <fpage>443</fpage>
          -
          <lpage>446</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>C.</given-names>
            <surname>Rodriguez-Guerrero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Knaepen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. C.</given-names>
            <surname>Fraile-Marinero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Perez-Turiel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Gonzalezde Garibay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lefeber</surname>
          </string-name>
          ,
          <article-title>Improving challenge/skill ratio in a multimodal interface by simultaneously adapting game dificulty and haptic assistance through psychophysiological and performance feedback</article-title>
          ,
          <source>Frontiers in Neuroscience</source>
          <volume>11</volume>
          (
          <year>2017</year>
          )
          <fpage>24120</fpage>
          . .d3o3i8:9/fnins.
          <year>2017</year>
          .
          <volume>00242</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Aly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Tapus</surname>
          </string-name>
          ,
          <article-title>Towards an intelligent system for generating an adapted verbal and nonverbal combined behavior in human-robot interaction</article-title>
          ,
          <source>Autonomous Robots</source>
          <volume>40</volume>
          (
          <year>2016</year>
          )
          <fpage>193</fpage>
          -
          <lpage>209</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Deng</surname>
          </string-name>
          ,
          <article-title>Deep facial expression recognition: A survey</article-title>
          ,
          <source>IEEE Transactions on Afective Computing</source>
          (
          <year>2020</year>
          )
          <fpage>1</fpage>
          -
          <lpage>1</lpage>
          . doi1:
          <fpage>0</fpage>
          .1109/TAFFC.
          <year>2020</year>
          .
          <volume>2981446</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Johnson</surname>
          </string-name>
          , M. Mohan,
          <string-name>
            <given-names>R.</given-names>
            <surname>Mendonca</surname>
          </string-name>
          ,
          <article-title>A stimulus-response model of therapist-patient interactions in task-oriented stroke therapy can guide robot-patient interactions</article-title>
          ,
          <source>in: Proceedings of the Annual Rehabilitation Engineering and Assistive Technology Society of North America (RESNA) Conference</source>
          , New Orleans, USA,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S.</given-names>
            <surname>Schaal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Peters</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Nakanishi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ijspeert</surname>
          </string-name>
          ,
          <article-title>Control, planning, learning, and imitation with dynamic movement primitives</article-title>
          ,
          <source>Workshop on Bilateral Paradigms on Humans and Humanoids, IEEE Int. Conf. on Intelligent Robots and Systems</source>
          , Las Vegas,
          <string-name>
            <surname>NV</surname>
          </string-name>
          (
          <year>2003</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Baevski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mohamed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Auli</surname>
          </string-name>
          , wav2vec
          <volume>2</volume>
          .
          <article-title>0: A framework for self-supervised learning of speech representations</article-title>
          , CoRR abs/
          <year>2006</year>
          .11477 (
          <year>2020</year>
          ). hUtRtLp:s://arxiv.org/ abs/
          <year>2006</year>
          .11477. arXiv:
          <year>2006</year>
          .11477.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vanzo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Croce</surname>
          </string-name>
          , E. Bastianelli,
          <string-name>
            <given-names>R.</given-names>
            <surname>Basili</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Nardi</surname>
          </string-name>
          ,
          <article-title>Grounded language interpretation of robotic commands through structured learning</article-title>
          ,
          <source>Artif. Intell</source>
          .
          <volume>278</volume>
          (
          <issue>21002</issue>
          .01)
          <article-title>0</article-title>
          .1d6o/ij: . artint.
          <year>2019</year>
          .
          <volume>103181</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>C. F.</given-names>
            <surname>Baker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. J.</given-names>
            <surname>Fillmore</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. B.</given-names>
            <surname>Lowe</surname>
          </string-name>
          ,
          <article-title>The berkeley framenet project</article-title>
          ,
          <source>in: Proceedings of COLING-ACL</source>
          <year>1998</year>
          ,
          <year>1998</year>
          . doi:
          <volume>10</volume>
          .3115/980845.980860.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>J. D.</given-names>
            <surname>Williams</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Raux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Henderson</surname>
          </string-name>
          ,
          <article-title>The dialog state tracking challenge series: A review</article-title>
          ,
          <source>Dialogue &amp; Discourse</source>
          <volume>7</volume>
          (
          <year>2016</year>
          )
          <fpage>4</fpage>
          -
          <lpage>33</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>D.</given-names>
            <surname>Milward</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Beveridge</surname>
          </string-name>
          ,
          <article-title>Ontology-based dialogue systems</article-title>
          ,
          <source>in: Proc. 3rd Workshop on Knowledge and reasoning in practical dialogue systems (IJCAI03)</source>
          ,
          <year>2003</year>
          , pp.
          <fpage>9</fpage>
          -
          <lpage>18</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>C. J.</given-names>
            <surname>Fillmore</surname>
          </string-name>
          ,
          <article-title>Frames and the semantics of understanding</article-title>
          ,
          <source>Quaderni di Semantica</source>
          <volume>6</volume>
          (
          <year>1985</year>
          )
          <fpage>222</fpage>
          -
          <lpage>254</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>D.</given-names>
            <surname>Gildea</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Jurafsky</surname>
          </string-name>
          , Automatic Labeling of Semantic Roles,
          <source>Computational Linguistics</source>
          <volume>28</volume>
          (
          <year>2002</year>
          )
          <fpage>245</fpage>
          -
          <lpage>28a8r</lpage>
          .Xiv:https://direct.mit.edu/coli/articlepdf/28/3/245/1797857/089120102760275983.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>E.</given-names>
            <surname>Bastianelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Castellucci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Croce</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Basili</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Nardi</surname>
          </string-name>
          ,
          <article-title>Efective and robust natural language understanding for human-robot interaction</article-title>
          ,
          <source>in: ECAI 2014 - 21st European Conference on Artificial Intelligence</source>
          ,
          <fpage>18</fpage>
          -22
          <source>August</source>
          <year>2014</year>
          , Prague, Czech Republic, volume
          <volume>263</volume>
          <source>ofFrontiers in Artificial Intelligence and Applications</source>
          , IOS Press,
          <year>2014</year>
          , pp.
          <fpage>57</fpage>
          -
          <lpage>62</lpage>
          . URL: https://doi.org/10.3233/978-1-
          <fpage>61499</fpage>
          -419-0- 5.7doi:
          <fpage>10</fpage>
          .3233/978-1-
          <fpage>61499</fpage>
          -419-0-57.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>S.</given-names>
            <surname>Filice</surname>
          </string-name>
          , G. Castellucci,
          <string-name>
            <given-names>G. D. S.</given-names>
            <surname>Martino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Moschitti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Croce</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Basili</surname>
          </string-name>
          ,
          <article-title>Kelp: a kernelbased learning platform</article-title>
          ,
          <source>Journal of Machine Learning Research</source>
          <volume>18</volume>
          (
          <year>2018</year>
          )
          <fpage>1</fpage>
          -
          <lpage>5</lpage>
          . URL: http://jmlr.org/papers/v18/
          <fpage>16</fpage>
          -
          <lpage>087</lpage>
          .ht. ml
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>