Introduction

Integrated Task Learning and Kinesthetic Teaching for Human-Robot Cooperation

Riccardo Caccavale

Alberto Finzi

alberto.finzig@unina.it 0

Dongheui Lee

fdhleeg@tum.de 1

Matteo Saveriano

fsaverianog@lsr.ei.tum.de 1 0 DIETI, Universita degli Studi di Napoli Federico II 1 LSR, Technische Universitat Munchen

We present an integrated framework that permits implicit task learning and kinesthetic teaching during the execution of robotic tasks in cooperation with humans. The proposed system combines physical human-robot interaction, attentional supervision, multimodal interaction to support robot teaching and incremental task learning. We describe the overall system architecture discussing a task learning scenario.

Introduction

An e ective cooperation between a robotic system and a human during the execution of complex tasks requires natural interaction and continuous and incremental adaptation. In this work, we present an integrated framework that permits implicit task learning and kinesthetic teaching during human-robot interaction. Our aim is to allow a human operator to naturally interact with a robot in order to teach incrementally complex and re ned tasks.

Our approach integrates multimodal interaction [ 9 ], attentional supervision [ 8, 6, 4 ], and kinesthetic teaching [ 10, 11 ]. In this setting, the human operator can naturally interact with the robot using gestures, voice, and physical guidance, while a supervisory attentional system [ 8, 6, 4 ] continuously supervises and tracks the human-robot interactive activities during both training and execution sessions. Attentional mechanisms suitable for human-robot task teaching have been explored in the literature, mainly in the context of visual attention [ 3, 7, 1 ]; in contrast, in this work we focus on attentional supervision and physical interaction. Namely, in the proposed framework the human can continuously switch from execution to teaching and vice versa; in course of a kinesthetic teaching session, the human can physically interact with the robot in order to demonstrate the execution of an action, while the supervisory system is exploited to interpret the human guidance in the context a structured task. In this setting, the supervisory attentional system supports implicit non-verbal communication and permits to track the human demonstration at di erent levels of abstraction (tasks, sub-tasks, actions and motions primitives).

In the rest of the paper we detail the system architecture and describe its functioning in a simple task learning scenario.

System Architecture

In Figure 1, we illustrate the overall architecture. The human can interact with the system in a multimodal manner with gestures, speech, and physical guidance during kinesthetic teaching session. In this context, the Robot Behavior Manager manages low-level tasks execution, supervision, and learning, while an Attentional System is responsible for hierarchical tasks supervision and behavior orchestration. These components are better described below.

Robot Behavior Manager. The Robot Behavior Manager (RBM) handles lowlevel aspects of the human-robot interaction and it is responsible for a correct task execution. In particular, RBM is responsible for: i) smooth transition between teaching and execution modes; ii) demonstrated task segmentation into basic motion primitives; iii) scene monitoring (objects classi cation and tracking); and iv) robot state monitoring (robot-objects distance, motion primitives learned or executed). Task teaching is performed by means of kinesthetic teaching [ 10 ]. In this work, we use the gravity compensation control to make the robot ideally massless, guaranteeing an easy and safe physical guidance. High level tasks are represented as a set of point-to-point motion primitives (reaching and manipulating objects), learned from human demonstrations. RBM adopts stable dynamical systems to compactly represent motion primitives and to generate motor commands in the execution phase. Dynamical systems are well-suited for point-to-point motion generation since they are guaranteed to converge towards a given target, and they can rapidly adapt to external perturbations, like changes in the initial/target location and unforeseen obstacles [ 11 ]. Attentional System. The attentional system provides the cognitive control mechanisms needed to exibly orchestrate the execution of complex tasks and to monitor the human activities. Following a supervisory attentional system and contention scheduling approach [ 8, 6 ], we propose a framework where interactive action execution and learning are supported by attentional regulations. The attentional system exploits hierarchical task representations to supervise and regulate the robot actions, while interacting with the human. More speci cally, we rely on the system proposed by [ 4 ], which is endowed with a Long Term Memory (LTM) and a Working Memory (WM) (see Attentional Executive System in Figure 1). The LTM contains the behavioral repertoire available to the system, including structured tasks and primitive actions. These tasks/behaviors are to be allocated and instantiated in the Working Memory (WM) for their actual execution. In particular, the cognitive control cycle is managed by a process that continuously updates the WM by allocating and deallocating hierarchical tasks/behaviors according to their denotations in the LTM. The WM represents the executive state of the system and is associated with concrete sensorimotor processes (see Attentional Behavior-based System in Figure 1) whose activations are top-down (task-based) and bottom-up (stimuli-driven) regulated by the attentional in uences. In this context, multiple tasks can be executed at the same time and several behaviors can compete in the WM generating con icts and impasses [ 2 ]. Contentions among alternative behaviors are solved exploiting the attentional activations: following a winner-takes-all approach, the behaviors associated with the higher activations are selected with the exclusive access to mutually exclusive resources. Additional details about this framework can be found in [ 4, 5 ]. This attentional supervisory framework can be deployed not only during cooperative action execution, but also when the operator interacts with the robotic system in order to teach a new task. 3

Action Teaching and Segmentation

In our framework, the user can anytime switch between teaching and execution during the robot activity. If the current task structure is not linked to concrete sensorimotor behaviors, the system waits for the user guidance in order to learn how to execute the missing subtasks and motion primitives. During the teaching phase the human can physically guide the robot in order to demonstrate the correct task execution; this kinesthetic teaching session is supervised by the attentional system which is to associate these training motions to the correct tasks and sub-tasks. The human can also explicitly communicate with the robot (using gestures or speech) in order to facilitate the learning process with additional verbal/non-verbal cues or to inspect a trained activity invoking the repetition of learned tasks and sub-tasks. In this setting, the attentional system tracks and monitors both the human and the robot task execution. This way, during a learning session the low-level robotic actions, trained by the user (such as trajectories, objects handling, etc.) through kinesthetic teaching, can be labeled by the higher level tasks/sub-tasks interpreted by the attentional system. Figure 2 illustrates the hierarchical structure associated with a pick and place task. During the teaching mode the attentional system monitors the subtasks to be ful lled (pick(water) and place(water) in Figure 2), here the distance between the end-e ector and the related objects directly a ects bottom-up attentional mechanisms (a close object emphasizes the related a ordances and the associated behaviors in the WM). When a new segment is recognized by the system, a new node in the tree is generated and linked to the most emphasized subtask. Here, we deploy a simple action segmentation mechanism which is based on object proximity and explicit commands. Each object in the environment is associated with a proximity area. When the end-e ector of the robot (or the human hand) enters or leaves the proximity area of an object a new segment is generated. Analogously, when an open/close gripper command is executed a new low-level action is created. We distinguish between two classes of actions: Near-Object-Action (NOA) and Far-Object-Action (FOA). In the case of NOA, the action is segmented inside the proximity area of an object and we exploit Dynamic Movement Primitives to compute a robust approximation of the observed trajectory in order to reproduce the motion more accurately. Instead, in the case of FOA, the action is segmented out of the proximity area of any object and only the end-point of the observed trajectory is considered. Indeed, in this case the action can be reproduced in a less accurate manner allowing the robot to reach the end-point regardless of the starting-point. The proposed segmentation mechanism allows the system to recognize complex actions involving two or more objects. For example, the pouring action (NOA) illustrated in Figure 3 has been trained with high accuracy and associated with the pour(water) primitive behavior within the abstract task of pouring.

ACKNOWLEDGMENT

The research leading to these results has been supported by the RoDyMan and SAPHARI projects, which have received funding from the European Research

Council under Advanced Grant agreement number 320992 and 287513, respectively, the International Graduate School of Science and Engineering (IGSSE).

1. Borji , A. , Ahmadabadi , M.N. , Araabi , B.N. , Hamidi , M. : Online learning of taskdriven object-based visual attention control . Image Vision Comput . 28 ( 7 ), 1130 { 1145 (Jul 2010 )

2. Botvinick , M.M. , Braver , T.S. , Barch , D.M. , Carter , C.S. , Cohen , J.D. : Con ict monitoring and cognitive control . Psychological review 108(3) , 624 ( 2001 )

3. Breazeal , C. , Berlin, M.: Spatial sca olding for sociable robot learning . In: Proc. of AAAI-2008 . pp. 1268 { 1273 ( 2008 )

4. Caccavale , R. , Finzi , A. : Plan execution and attentional regulations for exible human-robot interaction . In: Proc. of SMC 2015 ( 2015 )

5. Caccavale , R. , Finzi , A. : Flexible task execution and attentional regulations in human-robot interaction . IEEE Trans. Cognitive and Developmental Systems 9 ( 1 ), 68 { 79 ( 2017 )

6. Cooper , R.P. , Shallice , T. : Hierarchical schemas and goals in the control of sequential behavior . Psychological Review 113 ( 4 ), 887 { 916 ( 2006 )

7. Nagai , Y. : From bottom-up visual attention to robot action learning . In: Proc. of International Conference on Development and Learning . pp. 1 { 6 ( 2009 )

8. Norman , D.A. , Shallice , T. : Attention to action: Willed and automatic control of behavior . In: Consciousness and self-regulation: Advances in research and theory , vol. 4 , pp. 1 { 18 ( 1986 )

9. Rossi , S. , Leone , E. , Fiore , M. , Finzi , A. , Cutugno , F. : An extensible architecture for robust multimodal human-robot communication . In: In Proc. of IROS-2013 . pp. 2208 { 2213 ( 2013 )

10. Saveriano , M. , An , S. , Lee , D. : Incremental kinesthetic teaching of end-e ector and null-space motion primitives . In: ICRA 2015 . pp. 3570 { 3575 ( 2015 )

11. Saveriano , M. , Lee , D. : Distance based dynamical system modulation for reactive avoidance of moving obstacles . In: Proc. of ICRA-2014 . pp. 5618 { 5623 ( 2014 )