<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Robust and Incremental Robot Learning by Imitation</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Sapienza University of Rome</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>In the last years, Learning by Imitation (LbI) has been increasingly explored in order to easily instruct robots to execute complex motion tasks. However, most of the approaches do not consider the case in which multiple and sometimes con icting demonstrations are given by di erent teachers. Nevertheless, it seems advisable that the robot does not start as a tabula-rasa, but re-using previous knowledge in imitation learning is still a di cult research problem. In order to be used in real applications, LbI techniques should be robust and incremental. For this reason, the challenge of our research is to nd alternative methods for incremental, multi-teacher LbI.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Over the last decade, robot Learning by Imitation (LbI) has been increasingly
explored in order to easily and intuitively instruct robots to execute complex
tasks. By providing a human-friendly interface for programming by
demonstration, such methods can support the deployment of robotics in domestic and
industrial environments. Technical intervention of expert users, in fact, would
be not strictly required and, therefore, the costs for (re)programming a robot
are drastically reduced. Despite the advantages in terms of exibility and cost
reduction, LbI also brings its own set of problems. For example, understanding
the focus of the demonstration (\what to imitate"), adapting the demonstration
to the di erent embodiment of the robots and obtaining good performances in
task execution (\how to imitate") are typical challenges of LbI. These problems
have been described and addressed in several ways [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ][
        <xref ref-type="bibr" rid="ref2">2</xref>
        ][
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and a large literature
exists on the topic. For example, di erent representations have been proposed
for encoding learned trajectories or goals, and interactive learning techniques
have been developed, as in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], for improving the acquired skills. The common
assumption behind a large part of work in literature, however, is that
demonstrations are provided by a single teacher, in particular a human. This is not always
the case, because not only a robot could learn from other robots [
        <xref ref-type="bibr" rid="ref5 ref9">5</xref>
        ][
        <xref ref-type="bibr" rid="ref10 ref6">6</xref>
        ] or animals
(e.g., bio-inspired robots), but also multiple teachers could provide to the robot
con icting demonstrations or feedback/advice. Moreover, while only some work
focused on the incremental learning problem, it is crucial for achieving robot
autonomy. It seems advisable, in fact, that a robot does not start to learn a
      </p>
    </sec>
    <sec id="sec-2">
      <title>Learning by Imitation Who to Imitate?</title>
    </sec>
    <sec id="sec-3">
      <title>When</title>
      <p>to Imitate?</p>
      <p>What
to Imitate?
• Reliability
evaluation
• Teacher</p>
      <p>selection
• Feedback
selection
• Timing
• Scheduling
and
coactivation
• Priority
• Goal or</p>
    </sec>
    <sec id="sec-4">
      <title>Trajectory</title>
      <p>• Affordances</p>
      <p>and Effects
• Context
evaluation</p>
      <p>How
to Imitate?
• Strategy</p>
      <p>selection
• Embodiment</p>
      <p>problem
• Skill encoding
• Hierarchies and
skill re-use,
scalability
single task every time from scratch, since its knowledge can be augmented for
executing more complex tasks or for obtaining increasingly better imitations.
The challenge of our research is to propose a set of solutions for improving LbI
techniques, by considering both multiple teachers and incremental learning. In
contrast to previous work, we will focus our research on learning from multiple
categories of teachers (e.g., humans, robots, animals). Moreover, we will consider
classical solutions like reliability measurements and teacher selection as well as
techniques for strategy co-activation, strategy changing and online re nement
via contrasting feedback/advice. Sub-skill co-activation will be also adopted for
improving incremental learning, with the underlying idea of extending current
non-symbolic approaches to reach higher levels of learning autonomy for
hierarchical and complex tasks.
2</p>
      <sec id="sec-4-1">
        <title>Related Work</title>
        <p>
          As already stated in the previous Section, Learning by Imitation provides a
high level method for programming a robot which can be easily used by non
expert users. However, while the e ort for providing prior knowledge to the
robot is drastically reduced, new and di erent issues emerge. A frequently used
description of LbI challenges consists of a set of independent problems presented
in the form of questions (see Fig. 1): Who to imitate? When to imitate? What
to imitate? How to imitate?. A huge e ort has been done, in previous work, for
understanding what is relevant for the robot and how it should learn a skill,
while who and when to imitate are still open challenges. Indeed, only a small
amount of work has been done in this direction. A detailed overview over the
adopted approaches for solving those problems can be found in [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] and [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ].
        </p>
        <p>
          One of the rst problems when dealing with imitation is to understand how
to encode a learned skill. While spline representations cannot be easily used for
encoding a skill, because of their explicit time dependency, many alternatives
exist. In detail, Hidden Markov models (HMMs) have often been successfully
applied in this context [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. Billard et al. [
          <xref ref-type="bibr" rid="ref11">9</xref>
          ], for example, use two HMMs, one
to eliminate signals with high variability and the other one, fully connected, to
obtain a probabilistic encoding of the task. In the work by Asfour et al. [
          <xref ref-type="bibr" rid="ref12">10</xref>
          ], a
humanoid robot is instructed by using continuous HMMs, trained with a set of
key points common to almost all the demonstrations. By detecting also temporal
dependencies between the two arms, dual-arm tasks are successfully executed.
Calinon et al. [
          <xref ref-type="bibr" rid="ref13">11</xref>
          ], instead, use HMMs for representing a joint distribution of
position and velocity, while generalizing the motion during the reproduction
through the use of Gaussian Mixture Regression. The approach is validated
on several robotics platforms. Additional improvements in the generalization of
movements have been achieved thanks to the use of Gaussian mixture models
(GMMs). For example, in [
          <xref ref-type="bibr" rid="ref14">12</xref>
          ], the authors propose a LbI framework, based on a
mixture of Gaussian/Bernoulli distributions, for extracting relevant features of a
task and generalizing the acquired knowledge to multiple contexts. Chernova and
Veloso [
          <xref ref-type="bibr" rid="ref15">13</xref>
          ], instead, use a representation of the policy based on GMMs in order
to address the uncertainty of human demonstration. In particular, they propose
an approach which enables the agent to request demonstrations for speci c parts
of the state space, achieving increasing autonomy in the execution based on the
analysis of the learned Gaussian mixture set.
        </p>
        <p>
          In order to reduce the typically high-dimensional state-action space of those
problems, a di erent category of work focus on the representation of tasks as
a composition of motion primitives. Dynamic Movement Primitives (DMPs), in
particular, have been proposed by Ijspeert et al. [
          <xref ref-type="bibr" rid="ref16">14</xref>
          ][
          <xref ref-type="bibr" rid="ref17">15</xref>
          ][
          <xref ref-type="bibr" rid="ref18">16</xref>
          ] in order to encode
the properties of the motion by means of di erential equations. These
primitives, which can take into account perturbations and feedback terms, have been
successfully applied by Schaal et al. [
          <xref ref-type="bibr" rid="ref19">17</xref>
          ], in the context of learning by
demonstration, on several examples. Ude et al. [
          <xref ref-type="bibr" rid="ref20">18</xref>
          ] present a method for generalizing
periodic DMPs and synthesizing new actions in situations that a robot has never
encountered before. As an additional example, Stulp and Schaal [
          <xref ref-type="bibr" rid="ref21">19</xref>
          ] use DMPs
for hierarchical learning via Reinforcement Learning (RL) and apply their
approach on a 11-DOF arm plus hand for a pick-and-place task. More recently, an
alternative representation such as Probabilistic Movement Primitives has been
proposed by Paraschos et al. [
          <xref ref-type="bibr" rid="ref22">20</xref>
          ], which can be used in several applications and
allows for blending between motions, adapting to altered task variables, and
co-activating multiple motion primitives in parallel.
        </p>
        <p>
          In several work, traditional imitation learning techniques have been
associated with methods for re ning the learned policy, as in the case of Nicolescu and
Mataric [
          <xref ref-type="bibr" rid="ref23">21</xref>
          ]. More speci c approaches based on Reinforcement Learning enable
the reduction of the time needed for nding good control policies, while
improving the performance of the robot (when possible) beyond that of the teacher.
Guenter and Billard [
          <xref ref-type="bibr" rid="ref24">22</xref>
          ], for example, use RL in order to relearn goal-oriented
tasks even with unexpected perturbations. More in detail, a GMM is used as
a rst trial to reproduce the task and, then, RL is used to adapt the encoded
speed to perturbations. A limitation of the approach is that the system requires
to completely relearn the trajectory every time a new perturbation is added.
Kober and Peters [
          <xref ref-type="bibr" rid="ref25">23</xref>
          ] use episodic RL in order to improve motor primitives
learned by imitation for a Ball-in-a-Cup task. Kormushev et al. [
          <xref ref-type="bibr" rid="ref26">24</xref>
          ], instead,
encode movements with and extension of DMPs initialized from imitation. RL
is then used for learning the optimal parameters of the policy, thus improving
the learned capability. A di erent approach in the same direction, instead, has
been proposed by Argall et al. [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. Rather than using traditional Reinforcement
Learning, in fact, the authors consider the advice of the teacher in order to
improve the learned policy, by directly applying a correction on the executed
state-action mapping.
        </p>
        <p>
          Such solutions are well suited whenever a robot needs to learn a task from a
single teacher. However, issues emerge if con icting demonstrations, or rewards
in the case of RL, are provided by di erent teachers (\who to imitate") by means
of di erent sensors and modalities. For non-linear system, in fact, simply
averaging the learned trajectories usually results in a new trajectory that is not feasible,
since it does not obey the constraints of the dynamic model. Preliminary work in
the direction of addressing this problem has been done by Nicolescu and Mataric
[
          <xref ref-type="bibr" rid="ref23">21</xref>
          ], who propose a topology based method for generalization among multiple
demonstrations represented as behavior networks. Argall et al. [
          <xref ref-type="bibr" rid="ref27">25</xref>
          ] consider the
incorporation of demonstrations from multiple teachers by selecting among them
on the basis of their observed reliability. More speci cally, reliability is measured
and represented through a weighted scheme. Babes et al. [26] apply Inverse
Reinforcement Learning (IRL) [27] to learning from demonstration, by adopting
a clustering procedure on the observed trajectory for inferring the expert's
intention. This is particularly useful to discriminate among di erent
demonstrations whose underlying goal (and reward function) is not previously or clearly
speci ed. Tawani and Billard [28], instead, propose a method based on IRL for
learning to mimic a variety of experts with di erent strategies. While
providing a high adaptability, such an approach enables to bootstrap optimal policy
learning by transferring knowledge from the set of learned policies. Most of this
approaches, however, neither enable to smoothly switch among di erent policies,
when needed, nor consider the opportunity to prioritize among di erent
strategies which are not incompatible. Moreover, teachers are usually considered to
be human beings, while in real applications demonstrations could be provided
by arbitrary expert agents, such as other robots [
          <xref ref-type="bibr" rid="ref5 ref9">5</xref>
          ][
          <xref ref-type="bibr" rid="ref10 ref6">6</xref>
          ] or even animals.
Additional work should be focused on the online version of this problem, in which
contrasting feedback is given to the robot by multiple teachers and re nements
over di erent learned policies are required.
        </p>
        <p>
          Another limitation of the work in the literature is in the assumption that
robots need to learn a single task from scratch, without previous knowledge.
Real world applications, instead, are highly demanding for robots which can
incrementally acquire new task execution capabilities based on already learned
skills. A huge e ort for dealing with this problem has been done in the direction
of using symbolic representations of tasks, as in [
          <xref ref-type="bibr" rid="ref23">21</xref>
          ]. Pardowitz et al. [29][30],
who follow the general approach described in [31], use a hierarchical
representation of complex tasks, generated as a sequence of elementary operators (i.e.,
basic actions, primitives). The method is applied on a robot servant which has
to learn an everyday household task by combining reasoning and learning. A
similar approach is used in the work by Ekvall and Kragic [32], who decompose
each task in sub-tasks which are then used, together with a set of constraints
and the identi ed goal, for obtaining generalization. Symbolic representations
o er, of course, many advantages when dealing with complex tasks, but they
require a big e ort to provide prior knowledge to the robot, resulting in a loss of
exibility. Conversely, other work is oriented to the achievement of incremental
learning from scratch, without the intervention of experts in providing
knowledge. Friesen and Rao [33] propose a solution for achieving hierarchical task
control, by means of an extended Bellman equation. Starting from the equation
used in [34] for \implicit imitation", the authors consider both temporally
extended action (called options) and primitives. Such options can execute other
options. An interesting evolution towards incremental learning can be noticed
in the work by the research group of Jan Peters [35][36][37][38][39][40][41]. In
particular, in [39] a general overview of the adopted modular approach is given.
The authors describe a method for generalizing and learning several motor
primitives (building blocks), as well as learning to select and to sequence the building
blocks for executing complex tasks. Even though this technique represents a huge
advancement towards incremental learning, the gap between the pure symbolic
approach and the \numerical" one is still signi cant.
3
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>Methodology and Proposed Solution</title>
        <p>The challenge of this research idea consists in addressing both the problems of
multi-teaching (robustness) and incremental learning, by starting from the work
previously presented. With this purpose, state-of-the-art sensing techniques and
o -the-shelf perception modules will be considered to acquire task
demonstrations, since they are not directly related to the considered challenges.</p>
        <p>The general idea of the proposed approach is based on a mixture of techniques
from Arti cial Intelligence and Control Theory. In fact, on the one hand
Reinforcement Learning has been often explored in combination with traditional LbI
for e cient and accurate task reproduction; on the other hand, it has been shown
that RL is also e ective for obtaining bio-inspired and adaptive controllers able
to nd optimal policies, in terms of control cost, on-line [42]. Assume that, for
each task, n di erent or contrasting demonstrations are provided to the robot,
by k di erent teachers. Each teacher may have his own strategy or may change
his behavior on the basis of the context. Starting from these, a basic step would
be the generation of a smaller number n of clusters, in order to reduce the
dimenT
E
A
C
H
E
R
1
T
E
A
C
H
E
R
2
…
T
E
A
C
H
E
R
K</p>
        <sec id="sec-4-2-1">
          <title>DEMONSTRATION 1</title>
        </sec>
        <sec id="sec-4-2-2">
          <title>DEMONSTRATION 2</title>
        </sec>
        <sec id="sec-4-2-3">
          <title>DEMONSTRATION 3</title>
          <p>…
…
…</p>
        </sec>
        <sec id="sec-4-2-4">
          <title>DEMONSTRATION N</title>
          <p>C
L
U
S
T
E
R
I
N
G</p>
        </sec>
        <sec id="sec-4-2-5">
          <title>CLUSTER 1 _</title>
        </sec>
        <sec id="sec-4-2-6">
          <title>CLUSTER N</title>
          <p>…
…</p>
          <p>…
sionality of the problem. After dividing the obtained clusters into m sub-parts,
through a segmentation process, each demonstrated sub-policy (n m) should be
learned by applying Inverse Reinforcement Learning techniques. Contextually,
in order to produce a more goal-oriented solution, m general DMPs (one for
each sub-part) will be continuously re ned on the basis of the set of all the n
demonstrations. A graphical description of the approach is available in Fig. 2.
At task execution time (on-line), for each sub-part, the robot should be able to
choose among the di erent policies and the re ned DMPs, on the basis of the
context or constraints. The choice will strictly depend on the state of the robot
and on the priority (if available) of the tasks to be executed. Interaction with
users characterized by di erent policies will enable a further re nement of the
adopted policies, as well as a weighting process among the produced solutions,
based on their given reward. Eventually, in case of non contrasting
demonstrations, a priority based execution of co-activated policies will be implemented.</p>
          <p>Intuitively, such a \motion library" will be useful to address two typical issues
of the incremental learning problem: recognizing in the demonstrations the set
of already available sub-skills, and reducing the redundancy of task information.
Based on this, the approach adopted in [39] for combining the building blocks
in the execution of complex tasks, will be extended to consider co-activated non
interfering sub-skills, on a priority basis. Moreover, a simple approach, based
on the extraction of the most relevant features of each sub-task, will be used in
order to try to partially reduce the gap between the numerical and symbolic
representations used in LbI. Contextually, higher level planning will be eventually
executed by means of Hierarchical Task Networks.</p>
          <p>
            The proposed solutions will be extensively validated on simulated and real
robots, as well as both in domestic and industrial domains. In particular, the
whole system will be developed on the Robot Operating System (ROS)1
framework, since it is very popular. This will enable not only an easy integration
with realistic simulators like V-REP2 and Webots3, but also a simply
transferable implementation for a real robot, like the KUKA Youbot (Fig. 3). Such a
robotic platform consists of a omnidirectional mobile base and a 5-DOF arm,
plus the gripper, and it can be considered a good solution for preliminary
experiments in this research. Using the Youbot, in fact, allows to experiment LbI
in industrial-like scenarios, as in the case of the RoCKIn4@Work competitions.
Due to the robot structure, LbI implementations on this platform will have to
take into account the correspondence problem [
            <xref ref-type="bibr" rid="ref1">1</xref>
            ]. Note, however, that this is a
classical issue in the LbI implementation pipeline, since the embodiment of the
demonstrator and the one of the robot are usually di erent, with the exception
of humanoid robots. Additional tests will be executed on speci c simple tasks
(e.g., door opening and ball throwing), as well as in the context of benchmarking
activities (e.g., RoCKIn).
4
          </p>
        </sec>
      </sec>
      <sec id="sec-4-3">
        <title>Conclusions and Potential Impact</title>
        <p>Producing a robot which can be easily instructed to perform di cult tasks will
open many business opportunities. In the next years, in fact, industrial and
general purpose domestic robots will be available to wider communities of non
expert users. The use of incremental human inspired learning approaches could
1 http://www.ros.org/
2 http://www.coppeliarobotics.com/
3 http://www.cyberbotics.com/
4 http://rockinrobotchallenge.eu/
enable next generation robots to learn from others as well as from their own
experience. For this reason, we strongly believe that an intuitive multi-teaching
\interface" for robots could improve not only the overall quality of the user
experience and the robot usability, but also the acceptance of robots in our
society. We also think that exploring robust and incremental LbI methods could
have a long-term positive impact from an economical point of view. Consider, for
example, the e ort in terms of money spent by big companies for programming
robots: industries could save a lot of money, having the possibility to easily
reprogram, or improve with the advice of di erent teachers, a single part of
the task that a robot has to execute. For this reason, developed algorithms
could be included in ROS Industrial5, whose goal is to transfer the advances in
robotics research to concrete applications, with economical potential. From an
academic point of view, the interest towards human movement understanding is
increasing6, and improvements in LbI could have a strong impact in this area,
since it is strictly related to natural movement and speci c motion dynamics.
In conclusion, we believe that research in this area can be further extended
towards practical applications and real world scenarios, but we are aware that
this document represents only the starting point for a detailed analysis and
investigation of the possible techniques for approaching robust and incremental
LbI.
26. Babes, M., Marivate, V., Subramanian, K., Littman, M.L.: Apprenticeship learning
about multiple intentions. In: Proceedings of the 28th International Conference on
Machine Learning (ICML-11). (2011) 897{904
27. Ng, A.Y., Russell, S.J., et al.: Algorithms for inverse reinforcement learning. In:</p>
        <p>Icml. (2000) 663{670
28. Tanwani, A.K., Billard, A.: Transfer in inverse reinforcement learning for multiple
strategies. In: Intelligent Robots and Systems (IROS), 2013 IEEE/RSJ
International Conference on, Ieee (2013) 3244{3250
29. Pardowitz, M., Zollner, R., Dillmann, R.: Incremental learning of task sequences
with information-theoretic metrics. In: European Robotics Symposium 2006,
Springer (2006) 51{63
30. Pardowitz, M., Knoop, S., Dillmann, R., Zollner, R.: Incremental learning of tasks
from user demonstrations, past experiences, and vocal comments. Systems, Man,
and Cybernetics, Part B: Cybernetics, IEEE Transactions on 37(2) (2007) 322{332
31. Muench, S., Kreuziger, J., Kaiser, M., Dillman, R.: Robot programming by
demonstration (rpd)-using machine learning and user interaction methods for the
development of easy and comfortable robot programming systems. In: Proceedings of the
International Symposium on Industrial Robots. Volume 25., INTERNATIONAL
FEDERATION OF ROBOTICS, &amp; ROBOTIC INDUSTRIES (1994) 685{685
32. Ekvall, S., Kragic, D.: Learning task models from multiple human demonstrations.</p>
        <p>In: Robot and Human Interactive Communication, 2006. ROMAN 2006. The 15th
IEEE International Symposium on, IEEE (2006) 358{363
33. Friesen, A.L., Rao, R.P.: Imitation learning with hierarchical actions. In:
Development and Learning (ICDL), 2010 IEEE 9th International Conference on, IEEE
(2010) 263{268
34. Price, B., Boutilier, C.: Implicit imitation in multiagent reinforcement learning,</p>
        <p>Citeseer (1999)
35. Kupcsik, A.G., Deisenroth, M.P., Peters, J., Neumann, G.: Data-e cient
generalization of robot skills with contextual policy search. In: AAAI. (2013)
36. Muelling, K., Kober, J., Kroemer, O., Peters, J.: Learning to select and generalize
striking movements in robot table tennis. The International Journal of Robotics
Research 32(3) (2013) 263{279
37. Peters, J., Kober, J., Mulling, K., Kramer, O., Neumann, G.: Towards robot skill
learning: From simple skills to table tennis. In: Machine Learning and Knowledge
Discovery in Databases. Springer Berlin Heidelberg (2013) 627{631
38. Kober, J., Peters, J.: Learning prioritized control of motor primitives. In: Learning</p>
        <p>Motor Skills. Springer International Publishing (2014) 149{160
39. Neumann, G., Daniel, C., Paraschos, A., Kupcsik, A., Peters, J.: Learning modular
policies for robotics. Frontiers in Computational Neuroscience 8 (2014) 62
40. Bocsi, B., Csato, L., Peters, J.: Indirect robot model learning for tracking control.</p>
        <p>Advanced Robotics 28(9) (2014) 589{599
41. Muelling, K., Boularias, A., Mohler, B., Scholkopf, B., Peters, J.: Learning
strategies in table tennis using inverse reinforcement learning. Biological cybernetics
(2014)
42. Khan, S.G., Herrmann, G., Lewis, F.L., Pipe, T., Melhuish, C.: Reinforcement
learning and optimal adaptive control: An overview and implementation examples.
Annual Reviews in Control 36(1) (2012) 42{59</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Nehaniv</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dautenhahn</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          : Like me?
          <article-title>- measures of correspondence and imitation</article-title>
          .
          <source>Cybernetics and Systems</source>
          <volume>32</volume>
          (
          <issue>1-2</issue>
          ) (
          <year>2001</year>
          )
          <volume>11</volume>
          {
          <fpage>51</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Schaal</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ijspeert</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Billard</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Computational approaches to motor learning by imitation</article-title>
          .
          <source>Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences</source>
          <volume>358</volume>
          (
          <issue>1431</issue>
          ) (
          <year>2003</year>
          )
          <volume>537</volume>
          {
          <fpage>547</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Argall</surname>
            ,
            <given-names>B.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chernova</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Veloso</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Browning</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>A survey of robot learning from demonstration</article-title>
          .
          <source>Robot. Auton. Syst</source>
          .
          <volume>57</volume>
          (
          <issue>5</issue>
          ) (May
          <year>2009</year>
          )
          <volume>469</volume>
          {
          <fpage>483</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Argall</surname>
            ,
            <given-names>B.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Browning</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Veloso</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Learning robot motion control with demonstration and advice-operators</article-title>
          .
          <source>In: Intelligent Robots and Systems</source>
          ,
          <year>2008</year>
          .
          <article-title>IROS 2008</article-title>
          . IEEE/RSJ International Conference on,
          <source>IEEE</source>
          (
          <year>2008</year>
          )
          <volume>399</volume>
          {
          <fpage>404</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Hayes</surname>
            ,
            <given-names>G.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Demiris</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>A robot controller using learning by imitation</article-title>
          . University of Edinburgh, Department of Arti cial Intelligence (
          <year>1994</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Gaussier</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moga</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Quoy</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Banquet</surname>
            ,
            <given-names>J.P.</given-names>
          </string-name>
          :
          <string-name>
            <surname>From</surname>
          </string-name>
          perception
          <article-title>-action loops to imitation processes: A bottom-up approach of learning by imitation</article-title>
          .
          <source>Applied Arti cial Intelligence</source>
          <volume>12</volume>
          (
          <issue>7-8</issue>
          ) (
          <year>1998</year>
          )
          <volume>701</volume>
          {
          <fpage>727</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Billard</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Calinon</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dillman</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stefan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Robot programming by demonstration</article-title>
          .
          <source>In: Springer handbook of robotics</source>
          . Springer (
          <year>2008</year>
          )
          <volume>1371</volume>
          {
          <fpage>1394</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Hovland</surname>
            ,
            <given-names>G.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sikka</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McCarragher</surname>
            ,
            <given-names>B.J.:</given-names>
          </string-name>
          <article-title>Skill acquisition from human demonstration using a hidden markov model</article-title>
          .
          <source>In: Robotics and Automation</source>
          ,
          <year>1996</year>
          . Proceedings.,
          <source>1996 IEEE International Conference on. Volume</source>
          <volume>3</volume>
          ., Ieee (
          <year>1996</year>
          )
          <volume>2706</volume>
          {
          <fpage>2711</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>5 http://rosindustrial.org/</mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <article-title>6 See, for example</article-title>
          ,
          <source>IEEE RAS Technical Committee on Human Movement</source>
          Understandinghttp://www.ieee
          <article-title>-ras.org/human-movement-understanding</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          9.
          <string-name>
            <surname>Billard</surname>
            ,
            <given-names>A.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Calinon</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guenter</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Discriminative and adaptive imitation in uni-manual and bi-manual tasks</article-title>
          .
          <source>Robotics and Autonomous Systems</source>
          <volume>54</volume>
          (
          <issue>5</issue>
          ) (
          <year>2006</year>
          )
          <volume>370</volume>
          {
          <fpage>384</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          10.
          <string-name>
            <surname>Asfour</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Azad</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gyarfas</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dillmann</surname>
          </string-name>
          , R.:
          <article-title>Imitation learning of dual-arm manipulation tasks in humanoid robots</article-title>
          .
          <source>International Journal of Humanoid Robotics</source>
          <volume>5</volume>
          (
          <issue>02</issue>
          ) (
          <year>2008</year>
          )
          <volume>183</volume>
          {
          <fpage>202</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          11.
          <string-name>
            <surname>Calinon</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , D'halluin,
          <string-name>
            <given-names>F.</given-names>
            ,
            <surname>Sauser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.L.</given-names>
            ,
            <surname>Caldwell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.G.</given-names>
            ,
            <surname>Billard</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.G.</surname>
          </string-name>
          :
          <article-title>Learning and reproduction of gestures by imitation: An approach based on hidden Markov model and Gaussian mixture regression</article-title>
          .
          <source>IEEE Robotics and Automation Magazine</source>
          <volume>17</volume>
          (
          <issue>2</issue>
          ) (
          <year>2010</year>
          )
          <volume>44</volume>
          {
          <fpage>54</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          12.
          <string-name>
            <surname>Calinon</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guenter</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Billard</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>On learning, representing, and generalizing a task in a humanoid robot</article-title>
          .
          <source>Systems, Man, and Cybernetics</source>
          ,
          <string-name>
            <surname>Part</surname>
            <given-names>B</given-names>
          </string-name>
          :
          <string-name>
            <surname>Cybernetics</surname>
          </string-name>
          , IEEE Transactions on
          <volume>37</volume>
          (
          <issue>2</issue>
          ) (
          <year>2007</year>
          )
          <volume>286</volume>
          {
          <fpage>298</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          13.
          <string-name>
            <surname>Chernova</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Veloso</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Con dence-based policy learning from demonstration using gaussian mixture models</article-title>
          .
          <source>In: Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems</source>
          ,
          <source>ACM</source>
          (
          <year>2007</year>
          )
          <fpage>233</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          14.
          <string-name>
            <surname>Ijspeert</surname>
            ,
            <given-names>A.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nakanishi</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schaal</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Trajectory formation for imitation with nonlinear dynamical systems</article-title>
          .
          <source>In: Intelligent Robots and Systems</source>
          ,
          <year>2001</year>
          . Proceedings. 2001 IEEE/RSJ International Conference on. Volume
          <volume>2</volume>
          ., IEEE (
          <year>2001</year>
          )
          <volume>752</volume>
          {
          <fpage>757</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          15.
          <string-name>
            <surname>Ijspeert</surname>
            ,
            <given-names>A.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nakanishi</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schaal</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Learning attractor landscapes for learning motor primitives</article-title>
          .
          <source>Technical report</source>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          16.
          <string-name>
            <surname>Ijspeert</surname>
            ,
            <given-names>A.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nakanishi</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schaal</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Movement imitation with nonlinear dynamical systems in humanoid robots</article-title>
          .
          <source>In: Robotics and Automation</source>
          ,
          <source>2002. Proceedings. ICRA'02. IEEE International Conference on. Volume</source>
          <volume>2</volume>
          ., IEEE (
          <year>2002</year>
          )
          <volume>1398</volume>
          {
          <fpage>1403</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          17.
          <string-name>
            <surname>Schaal</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Peters</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nakanishi</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ijspeert</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Learning movement primitives</article-title>
          .
          <source>In: Robotics Research</source>
          . Springer (
          <year>2005</year>
          )
          <volume>561</volume>
          {
          <fpage>572</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          18.
          <string-name>
            <surname>Ude</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gams</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Asfour</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Morimoto</surname>
          </string-name>
          , J.:
          <article-title>Task-speci c generalization of discrete and periodic dynamic movement primitives</article-title>
          .
          <source>Robotics, IEEE Transactions on 26(5)</source>
          (
          <year>2010</year>
          )
          <volume>800</volume>
          {
          <fpage>815</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          19.
          <string-name>
            <surname>Stulp</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schaal</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Hierarchical reinforcement learning with movement primitives</article-title>
          .
          <source>In: Humanoid Robots (Humanoids)</source>
          ,
          <year>2011</year>
          11th IEEE-RAS International Conference on,
          <source>IEEE</source>
          (
          <year>2011</year>
          )
          <volume>231</volume>
          {
          <fpage>238</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          20.
          <string-name>
            <surname>Paraschos</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Daniel</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Peters</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neumann</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          :
          <article-title>Probabilistic movement primitives</article-title>
          .
          <source>In: Advances in Neural Information Processing Systems</source>
          . (
          <year>2013</year>
          )
          <volume>2616</volume>
          {
          <fpage>2624</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          21.
          <string-name>
            <surname>Nicolescu</surname>
            ,
            <given-names>M.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mataric</surname>
            ,
            <given-names>M.J.:</given-names>
          </string-name>
          <article-title>Natural methods for robot task learning: Instructive demonstrations, generalization and practice</article-title>
          . In: In Proceedings of the Second International Joint Conference on Autonomous Agents and
          <string-name>
            <surname>Multi-Agent Systems</surname>
          </string-name>
          . (
          <year>2003</year>
          )
          <volume>241</volume>
          {
          <fpage>248</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          22.
          <string-name>
            <surname>Guenter</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Billard</surname>
            ,
            <given-names>A.G.</given-names>
          </string-name>
          :
          <article-title>Using reinforcement learning to adapt an imitation task</article-title>
          .
          <source>In: Intelligent Robots and Systems</source>
          ,
          <year>2007</year>
          .
          <article-title>IROS 2007</article-title>
          . IEEE/RSJ International Conference on,
          <source>IEEE</source>
          (
          <year>2007</year>
          )
          <volume>1022</volume>
          {
          <fpage>1027</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          23.
          <string-name>
            <surname>Kober</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Peters</surname>
            ,
            <given-names>J.R.</given-names>
          </string-name>
          :
          <article-title>Policy search for motor primitives in robotics</article-title>
          .
          <source>In: Advances in neural information processing systems</source>
          . (
          <year>2009</year>
          )
          <volume>849</volume>
          {
          <fpage>856</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          24.
          <string-name>
            <surname>Kormushev</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Calinon</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Caldwell</surname>
            ,
            <given-names>D.G.</given-names>
          </string-name>
          :
          <article-title>Robot motor skill coordination with em-based reinforcement learning</article-title>
          .
          <source>In: Intelligent Robots and Systems (IROS)</source>
          ,
          <year>2010</year>
          IEEE/RSJ International Conference on,
          <source>IEEE</source>
          (
          <year>2010</year>
          )
          <volume>3232</volume>
          {
          <fpage>3237</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          25.
          <string-name>
            <surname>Argall</surname>
            ,
            <given-names>B.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Browning</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Veloso</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Automatic weight learning for multiple data sources when learning from demonstration</article-title>
          .
          <source>In: Robotics and Automation</source>
          ,
          <year>2009</year>
          . ICRA'09. IEEE International Conference on,
          <source>IEEE</source>
          (
          <year>2009</year>
          )
          <volume>226</volume>
          {
          <fpage>231</fpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>