<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Integrated Task Learning and Kinesthetic Teaching for Human-Robot Cooperation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Riccardo Caccavale</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alberto Finzi</string-name>
          <email>alberto.finzig@unina.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dongheui Lee</string-name>
          <email>fdhleeg@tum.de</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Matteo Saveriano</string-name>
          <email>fsaverianog@lsr.ei.tum.de</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>DIETI, Universita degli Studi di Napoli Federico II</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>LSR, Technische Universitat Munchen</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>We present an integrated framework that permits implicit task learning and kinesthetic teaching during the execution of robotic tasks in cooperation with humans. The proposed system combines physical human-robot interaction, attentional supervision, multimodal interaction to support robot teaching and incremental task learning. We describe the overall system architecture discussing a task learning scenario.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>An e ective cooperation between a robotic system and a human during the
execution of complex tasks requires natural interaction and continuous and
incremental adaptation. In this work, we present an integrated framework that
permits implicit task learning and kinesthetic teaching during human-robot
interaction. Our aim is to allow a human operator to naturally interact with a
robot in order to teach incrementally complex and re ned tasks.</p>
      <p>
        Our approach integrates multimodal interaction [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], attentional supervision
[
        <xref ref-type="bibr" rid="ref4 ref6 ref8">8, 6, 4</xref>
        ], and kinesthetic teaching [
        <xref ref-type="bibr" rid="ref10 ref11">10, 11</xref>
        ]. In this setting, the human operator
can naturally interact with the robot using gestures, voice, and physical
guidance, while a supervisory attentional system [
        <xref ref-type="bibr" rid="ref4 ref6 ref8">8, 6, 4</xref>
        ] continuously supervises and
tracks the human-robot interactive activities during both training and execution
sessions. Attentional mechanisms suitable for human-robot task teaching have
been explored in the literature, mainly in the context of visual attention [
        <xref ref-type="bibr" rid="ref1 ref3 ref7">3, 7, 1</xref>
        ];
in contrast, in this work we focus on attentional supervision and physical
interaction. Namely, in the proposed framework the human can continuously switch
from execution to teaching and vice versa; in course of a kinesthetic teaching
session, the human can physically interact with the robot in order to
demonstrate the execution of an action, while the supervisory system is exploited to
interpret the human guidance in the context a structured task. In this setting,
the supervisory attentional system supports implicit non-verbal communication
and permits to track the human demonstration at di erent levels of abstraction
(tasks, sub-tasks, actions and motions primitives).
      </p>
      <p>In the rest of the paper we detail the system architecture and describe its
functioning in a simple task learning scenario.</p>
    </sec>
    <sec id="sec-2">
      <title>System Architecture</title>
      <p>In Figure 1, we illustrate the overall architecture. The human can interact with
the system in a multimodal manner with gestures, speech, and physical
guidance during kinesthetic teaching session. In this context, the Robot Behavior
Manager manages low-level tasks execution, supervision, and learning, while an
Attentional System is responsible for hierarchical tasks supervision and behavior
orchestration. These components are better described below.</p>
      <p>
        Robot Behavior Manager. The Robot Behavior Manager (RBM) handles
lowlevel aspects of the human-robot interaction and it is responsible for a correct
task execution. In particular, RBM is responsible for: i) smooth transition
between teaching and execution modes; ii) demonstrated task segmentation into
basic motion primitives; iii) scene monitoring (objects classi cation and
tracking); and iv) robot state monitoring (robot-objects distance, motion primitives
learned or executed). Task teaching is performed by means of kinesthetic
teaching [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. In this work, we use the gravity compensation control to make the
robot ideally massless, guaranteeing an easy and safe physical guidance. High
level tasks are represented as a set of point-to-point motion primitives (reaching
and manipulating objects), learned from human demonstrations. RBM adopts
stable dynamical systems to compactly represent motion primitives and to
generate motor commands in the execution phase. Dynamical systems are well-suited
for point-to-point motion generation since they are guaranteed to converge
towards a given target, and they can rapidly adapt to external perturbations, like
changes in the initial/target location and unforeseen obstacles [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
Attentional System. The attentional system provides the cognitive control
mechanisms needed to exibly orchestrate the execution of complex tasks and to
monitor the human activities. Following a supervisory attentional system and
contention scheduling approach [
        <xref ref-type="bibr" rid="ref6 ref8">8, 6</xref>
        ], we propose a framework where
interactive action execution and learning are supported by attentional regulations. The
attentional system exploits hierarchical task representations to supervise and
regulate the robot actions, while interacting with the human. More speci cally,
we rely on the system proposed by [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], which is endowed with a Long Term
Memory (LTM) and a Working Memory (WM) (see Attentional Executive
System in Figure 1). The LTM contains the behavioral repertoire available to the
system, including structured tasks and primitive actions. These tasks/behaviors
are to be allocated and instantiated in the Working Memory (WM) for their
actual execution. In particular, the cognitive control cycle is managed by a process
that continuously updates the WM by allocating and deallocating hierarchical
tasks/behaviors according to their denotations in the LTM. The WM represents
the executive state of the system and is associated with concrete sensorimotor
processes (see Attentional Behavior-based System in Figure 1) whose
activations are top-down (task-based) and bottom-up (stimuli-driven) regulated by
the attentional in uences. In this context, multiple tasks can be executed at the
same time and several behaviors can compete in the WM generating con icts
and impasses [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Contentions among alternative behaviors are solved exploiting
the attentional activations: following a winner-takes-all approach, the behaviors
associated with the higher activations are selected with the exclusive access to
mutually exclusive resources. Additional details about this framework can be
found in [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ]. This attentional supervisory framework can be deployed not only
during cooperative action execution, but also when the operator interacts with
the robotic system in order to teach a new task.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Action Teaching and Segmentation</title>
      <p>In our framework, the user can anytime switch between teaching and execution
during the robot activity. If the current task structure is not linked to concrete
sensorimotor behaviors, the system waits for the user guidance in order to learn
how to execute the missing subtasks and motion primitives. During the teaching
phase the human can physically guide the robot in order to demonstrate the
correct task execution; this kinesthetic teaching session is supervised by the
attentional system which is to associate these training motions to the correct tasks
and sub-tasks. The human can also explicitly communicate with the robot
(using gestures or speech) in order to facilitate the learning process with additional
verbal/non-verbal cues or to inspect a trained activity invoking the repetition
of learned tasks and sub-tasks. In this setting, the attentional system tracks
and monitors both the human and the robot task execution. This way, during a
learning session the low-level robotic actions, trained by the user (such as
trajectories, objects handling, etc.) through kinesthetic teaching, can be labeled by
the higher level tasks/sub-tasks interpreted by the attentional system. Figure
2 illustrates the hierarchical structure associated with a pick and place task.
During the teaching mode the attentional system monitors the subtasks to be
ful lled (pick(water) and place(water) in Figure 2), here the distance between
the end-e ector and the related objects directly a ects bottom-up attentional
mechanisms (a close object emphasizes the related a ordances and the
associated behaviors in the WM). When a new segment is recognized by the system,
a new node in the tree is generated and linked to the most emphasized
subtask. Here, we deploy a simple action segmentation mechanism which is based
on object proximity and explicit commands. Each object in the environment is
associated with a proximity area. When the end-e ector of the robot (or the
human hand) enters or leaves the proximity area of an object a new segment
is generated. Analogously, when an open/close gripper command is executed a
new low-level action is created. We distinguish between two classes of actions:
Near-Object-Action (NOA) and Far-Object-Action (FOA). In the case of NOA,
the action is segmented inside the proximity area of an object and we exploit
Dynamic Movement Primitives to compute a robust approximation of the
observed trajectory in order to reproduce the motion more accurately. Instead, in
the case of FOA, the action is segmented out of the proximity area of any object
and only the end-point of the observed trajectory is considered. Indeed, in this
case the action can be reproduced in a less accurate manner allowing the robot
to reach the end-point regardless of the starting-point. The proposed
segmentation mechanism allows the system to recognize complex actions involving two or
more objects. For example, the pouring action (NOA) illustrated in Figure 3 has
been trained with high accuracy and associated with the pour(water) primitive
behavior within the abstract task of pouring.</p>
    </sec>
    <sec id="sec-4">
      <title>ACKNOWLEDGMENT</title>
      <p>The research leading to these results has been supported by the RoDyMan and
SAPHARI projects, which have received funding from the European Research</p>
      <p>Council under Advanced Grant agreement number 320992 and 287513,
respectively, the International Graduate School of Science and Engineering (IGSSE).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Borji</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ahmadabadi</surname>
            ,
            <given-names>M.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Araabi</surname>
            ,
            <given-names>B.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hamidi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Online learning of taskdriven object-based visual attention control</article-title>
          .
          <source>Image Vision Comput</source>
          .
          <volume>28</volume>
          (
          <issue>7</issue>
          ),
          <volume>1130</volume>
          { 1145 (Jul
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Botvinick</surname>
            ,
            <given-names>M.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Braver</surname>
            ,
            <given-names>T.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barch</surname>
            ,
            <given-names>D.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Carter</surname>
            ,
            <given-names>C.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cohen</surname>
            ,
            <given-names>J.D.</given-names>
          </string-name>
          :
          <article-title>Con ict monitoring and cognitive control</article-title>
          .
          <source>Psychological review 108(3)</source>
          ,
          <volume>624</volume>
          (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Breazeal</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          , Berlin, M.:
          <article-title>Spatial sca olding for sociable robot learning</article-title>
          .
          <source>In: Proc. of AAAI-2008</source>
          . pp.
          <volume>1268</volume>
          {
          <issue>1273</issue>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Caccavale</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Finzi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Plan execution and attentional regulations for exible human-robot interaction</article-title>
          .
          <source>In: Proc. of SMC</source>
          <year>2015</year>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Caccavale</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Finzi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Flexible task execution and attentional regulations in human-robot interaction</article-title>
          .
          <source>IEEE Trans. Cognitive and Developmental Systems</source>
          <volume>9</volume>
          (
          <issue>1</issue>
          ),
          <volume>68</volume>
          {
          <fpage>79</fpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Cooper</surname>
            ,
            <given-names>R.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shallice</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Hierarchical schemas and goals in the control of sequential behavior</article-title>
          .
          <source>Psychological Review</source>
          <volume>113</volume>
          (
          <issue>4</issue>
          ),
          <volume>887</volume>
          {
          <fpage>916</fpage>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Nagai</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>From bottom-up visual attention to robot action learning</article-title>
          .
          <source>In: Proc. of International Conference on Development and Learning</source>
          . pp.
          <volume>1</volume>
          {
          <issue>6</issue>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Norman</surname>
            ,
            <given-names>D.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shallice</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Attention to action: Willed and automatic control of behavior</article-title>
          . In:
          <article-title>Consciousness and self-regulation: Advances in research and theory</article-title>
          , vol.
          <volume>4</volume>
          , pp.
          <volume>1</volume>
          {
          <issue>18</issue>
          (
          <year>1986</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Rossi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leone</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fiore</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Finzi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cutugno</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>An extensible architecture for robust multimodal human-robot communication</article-title>
          .
          <source>In: In Proc. of IROS-2013</source>
          . pp.
          <volume>2208</volume>
          {
          <issue>2213</issue>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Saveriano</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>An</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Incremental kinesthetic teaching of end-e ector and null-space motion primitives</article-title>
          .
          <source>In: ICRA 2015</source>
          . pp.
          <volume>3570</volume>
          {
          <issue>3575</issue>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Saveriano</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Distance based dynamical system modulation for reactive avoidance of moving obstacles</article-title>
          .
          <source>In: Proc. of ICRA-2014</source>
          . pp.
          <volume>5618</volume>
          {
          <issue>5623</issue>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>