Examining the impact of data augmentation for psychomotor
skills training in human-robot interaction
Daniel Majonicaa,b, Deniz Irenb and Roland Klemkea,b
a
    Cologne Game Lab, TH Köln, Cologne, Germany
b
    Open Universiteit, Heerlen, The Netherlands


                 Abstract
                 Training psychomotor skills for human-robot interaction is generally done with a human trainer
                 educating the human on how to handle the robot effectively and interact with it safely and
                 efficiently. The dynamic interaction between a robot and a human requires complex machine
                 learning algorithms to be modeled, and these algorithms rely on a large amount of data to be
                 trained. Such data are collected by sensors when a human interacts with a robot. Consequently,
                 the data must be annotated by an expert. Finally, with the annotated data, a psychomotor skills
                 training model can be created to assist the training process. This is a time intensive and costly
                 process. To ease the costs and cut down collection time, we propose the use of data
                 augmentation.

                 Keywords 1
                 Human-robot interaction, data augmentation, machine learning


1. Introduction                                                                             the learning process ineffective and inefficient,
                                                                                            usually hindering the beginner's learning
                                                                                            progress. The project MILKI-PSY aims at
    Psychomotor skills constitute an essential
                                                                                            improving the remote learning process of
element of human-robot interaction. The
                                                                                            psychomotor skills. In this study, the data will
development of psychomotor skills requires
                                                                                            be collected using the multimodal pipeline
hands-on practice. In most cases, the practiced
                                                                                            framework [1] which is used to handle
skills need to be executed repetitively by the
                                                                                            multimodal data specifically.
learner in order to, for example, build muscle
                                                                                               Human-robot interaction describes the
memory and support further skill development.
                                                                                            process of a human interacting with a robot in a
Moreover, structured instructions and feedback
                                                                                            shared physical environment. In certain cases,
facilitate the learning process and allow safe
                                                                                            the human and the robot operate in a
performance of the practiced skills. Thus, an
                                                                                            cooperative manner. To ensure a safe, efficient,
educational model for psychomotor skill
                                                                                            and effective interaction requires training of
training needs to support the timely
                                                                                            both the robot and the human counterpart.
communication of the instructions and
                                                                                            Training humans in the handling of industrial
feedback and must define how these
                                                                                            robots is usually done by a human trainer. To
instructions and feedback are presented to the
                                                                                            assist the training of psychomotor skills, we
learner. The educational model also supports
                                                                                            propose a pedagogical approach. The goal of
the evaluation of the learning outcome.
                                                                                            our pedagogical model is to facilitate a safe,
Currently, doing this in a remote manner makes
                                                                                            effective, and efficient learning environment

Proceedings of the Doctoral Consortium of Sixteenth European
Conference on Technology Enhanced Learning, September 20–21,
2021, Bolzano, Italy (online).
EMAIL: dm@colognegamelab.de (A. 1); deniz.iren@ou.nl (A. 2);
rk@colognegamelab.com (A. 3)
ORCID: 0000-0003-4792-0472 (A. 1); 0000-0002-0727-3445 (A.
2); 0000-0002-9268-3229 (A. 3)
             © 2021 Copyright for this paper by its authors. Use permitted under Creative
             Commons License Attribution 4.0 International (CC BY 4.0).

             CEUR Workshop Proceedings (CEUR-WS.org)
for the learner. Particularly, in this study, we    of human and the robot while they interact.
focus on a collaborative assembly task between      When expensive machines like industrial robots
a human operator and an industrial robot in         are used, data collection becomes costly and
which they cooperate as depicted later in figure    effort-intensive. In such environments the
2.                                                  number of robots available for data collection
    In this paper, we first go over the related     purposes is also a limitation. Having a limit of
work in the field of human-robot interaction.       one or two robots is not uncommon in data
Then, we present three research questions that      collection. On the other side, for most data
we aim to address in this Ph.D. research. Next,     collection, the limiting factor might be a
we will discuss how we are going to achieve         machine-operating human which naturally is
this in the methodology part. Finally, we           limited in the amount of data they can collect in
conclude with the expected impact and a             a full day. To address the issue of data
discussion at the end of this paper.                collection we propose data augmentation which
                                                    is a family of techniques that allows us to
2. Related work                                     synthesize realistic data.
                                                        Formally, data augmentation can be defined
                                                    as techniques aiming at the creation of synthetic
    Human-robot interaction is a field dedicated    data [4] for the expansion of the size and/or the
to understanding, designing, and evaluating         diversity of the dataset [5]. A sub-field of data
robotic systems that interact with humans.          augmentation is domain randomization [6].
Interaction,      by     definition,     requires   Domain randomization can expose the machine
communication between the interacting parties,      learning model to many different variants of the
i.e., robots and humans. The communication          same problem [7][8] and therefore, train the
between robots and humans may take                  model more robustly. Another sub-field of data
completely different forms depending on the         augmentation is domain adaptation, which aims
distance between them. Goodrich and Schultz         to mitigate the covariate shift problem given
[2] categorizes human-robot communication           that training and evaluation sets derive from the
into two; proximate communication in which          same distribution. Studies exist in the literature
the communicating parties share the same space      that indicate using domain adaptation can
(physical or virtual) and remote communication      positively impact the performance of the
where those parties are apart. In this study, we    machine learning model, especially in the
focus on an assembly task where the human           domain of human pose detection for activities
operator shares the physical space with the         [5].
robot. In our remote learning scenarios, we rely        Using machine learning to categorize
on immersive technologies where there is no         complex psychomotor activity data for
physical robot but the operator and the virtual     educational purposes has been done before. For
robot still shares the same virtual environment.    example, Spikol et al. [9] used multimodal
Thus, the method of communication between           learning analytics to collect and provide
the parties is proximate.                           different data about the interaction between the
    Training in human-robot interaction             learner and the system. In cases where the
gradually becomes more important for the next       number of potential activity categories is
generation of robot systems. Particularly, in       significantly limited, such as the CPR tutor
cases of remote training or additional training     from Di Mitri et al. [10] and the table tennis
outside of the conventional teaching methods, a     tutor by Mat Sanusi et al. [11], using data
user-friendly interface should be used [3] to       augmentation might only marginally increase
improve the learning process. This user             the results and therefore might not be feasible
interface should be designed differently            due to the initial workload that those algorithms
depending on the communication category.            take. On the other hand, human-robot
     Immersive user interfaces such as AR and       interaction is a complex task and therefore
VR model the behavior of the human and robot        might greatly benefit from data augmentation.
agents. The development of such behavioral
models heavily uses machine learning
techniques that rely on a large amount of data.
Such data can be collected via environmental
sensors and cameras that capture the activities
3. Research questions                                   To address this research question, we will
                                                    consult experts in training humans on how to
                                                    interact with industrial robots and classify the
    The following research questions are the
                                                    common mistakes that can take place. Then we
focus of this Ph.D. research. First of all, it is
                                                    will explore how we can augment data in a
important to know what the current status in
                                                    meaningful manner to replicate common
teaching human-robot interaction is and how
                                                    mistakes.
humans are trained to handle industrial robots.
This requires an extensive review of the
literature that focuses on teaching humans how      4. Methodology
to use and operate robots.
                                                         In this study, first we conduct a systematic
    1. What are the common practices and            literature review in the following fields of
       mistakes in human-robot interaction          research: which are used in the domain of
       when handling industrial robots?             educational technologies.
    In order to allocate common practices in           1 Educational human-robot interaction
teaching human-robot interaction, it is crucial        2 Technologies in human-robot interaction
to know what kind of instructions are given by         3 Semi-supervised learning models
the trainer and what kind of feedback is               4 Data augmentation – Domain
received by the trainee. Moreover, we also aim      randomization
to examine the effect of various training
approaches (i.e., static, variable, and dynamic)        The first research field is educational
on the trainees learning progress. Secondly,        human-robot interaction and it refers to the
this study addresses current technologies that      education of humans in handling industrial
can support the training of psychomotor skills      robots. This includes the common training
and facilitate the teaching of human-robot          practices as well as common mistakes the
interaction.                                        trainee makes during the interaction. In this
                                                    study, we will use an industrial robot to
    2. What technological support is                assemble a box in cooperation with a human as
       achievable in educational human-robot
                                                    seen in figure 2. First, the human learner will be
       interaction?                                 trained on how to interact with the robot
                                                    appropriately. Then, the learner will be
    When looking at technologies, the focus of
                                                    instructed on the specifics of the assembly
this research relies on what existing
                                                    steps.
technologies cover both robots and humans and
                                                        The second research field is the technologies
also what kind of machine learning
                                                    used in human-robot interaction. This addresses
technologies are available. In this study, we
                                                    data augmentation and domain randomization
will utilize immersive technologies that vary in
                                                    which are both active fields of development,
terms of level of intrusion. For example, the use
                                                    and research papers about these topics in the
of a head-mounted display to provide feedback
                                                    domains of 3D pose detection [5] and object
to the learner in an augmented reality setting
                                                    detection [4] are released frequently in recent
has a lower level of intrusion than a completely
                                                    years.
simulated learning environment. Thirdly, it is
                                                        After the systematic literature review, the
important for this research to focus on the
                                                    next step will be to design a theoretical
identification and classification of common
                                                    framework. This framework includes the
mistakes made during psychomotor skills
                                                    design for an immersive training environment
training with robots.
                                                    specifically for psychomotor skills training. We
                                                    will use the four-component instruction design
    3. How can data augmentation assist the
                                                    (4C/ID) [12] method to create our framework.
       successful replication of common
                                                    In 4C/ID, the design is split up into four
       mistakes and how can we measure this
                                                    different components. The first component is
       impact?
                                                    the learning tasks which aim at integrating
                                                    skills and show a high variability of practice.
                                                    The second component is the supportive
   Figure 1: Design-based research (DBR) synthesis combined by DBR-models from Amiel & Reeves
[15] and De Villiers & Harpur [14]. Showing the iterative research process in the domain of educational
technologies

information which focuses on the performance           accommodate for innovative solutions for real-
of non-routine aspects of learning tasks. It is        life problems [15].
specified per task class and always available              In this study, we use machine learning to
throughout the whole learning process. The             extrapolate from collected data and model the
third component is the part-task practice. This        dynamic human-robot interaction environment.
aims to provide additional practice for                The process of data collection can take many
individual routines selected by either the trainer     forms. Using the physical environment for data
or trainee. The last component is procedural           collection has positive and negative aspects. On
information. Procedural information specifies          the positive side, collected data naturally
how to perform routine aspects of a task, for          captures and expresses the task that the robot
example by giving step-by-step instructions.           has to perform. On the negative side, the data
This procedural information is presented just in       collection task has certain limitations such as,
time during training and is gradually less             the availability and the speed of the robot and
present with the increasing expertise of the           human, the limited amount of human-robot data
trainee. In the case of designing a system for         collection stations (in most cases one or two
human-robot interaction, we will use 4C/ID to          robots), and the expected tiredness of the
teach the human learner different aspects of           human. In order to counter these problems of
handling industrial robots. 4C/ID provides a           data collection, we will be exploring data
framework to handle non-repetitive tasks which         augmentation.
include task non-specific repetitive elements.             We are planning to develop multiple
The psychomotor skills training in human-robot         prototypes over the course of this research. The
interaction has similar non-repetitive tasks           first prototype will be designed specifically for
which we are focusing on in this research.             the human-robot interaction where both sides
    The overall process of this research               have to cooperate in order to assemble a box
methodology is design-based research as                together. This prototype will use a virtual robot.
illustrated in figure 1. The steps of design-based     In the prototype, the human learner can interact
research include analysis, design, development,        with the virtual robot which is a simulated 3D
implementation, evaluation, and reflection. In         model visible through a camera or head-
contrast to predictive research, design-based          mounted display.
research uses an iterative process. This means
after evaluating the results, the entire process       5. Expected Impact
can be iterated based on the ADDIE (i.e.,
analysis, design, development, implementation,
evaluation) model [13]. While a generic                    By using data augmentation and generating
ADDIE model jumps from the evaluation step             synthetic data for modeling human-robot
directly to the solution of the problem [14], this     interaction, we expect the machine learning
approach includes a reflective phase whereby           model to perform equally or better than a
all previous steps are examined and refined for        machine learning model trained on physical
the next iteration. This reflection and                data alone. We also expect the data collection
refinement of problems, solutions, methods,            process to be faster and reusable in future
and design principles systemically tries to            applications. Examining the impact of data
                                                       augmentation for psychomotor skills training in
human-robot interaction, we hope to find a          developed. This prototype will be implemented
reliable and safe approach for training humans      and evaluated in the training environment.
how to handle industrial robots.
                                                    7. Acknowledgements
                                                        This project was funded by the BMBF, the
                                                    German Federal Ministry of Education and
                                                    Research, under the MILKI-PSY (ger. abb. for
                                                    Multimodal immersive learning with artificial
                                                    intelligence for psychomotor training;
                                                    https://www.milki-psy.de) Name and the grant
                                                    code: 16DHB4013.

                                                    8. References

                                                    [1] D. Di Mitri, J. Schneider, M. Specht, H.
                                                        Drachsler, Multimodal pipeline: A generic
Figure 2: Recent human-robot interaction                approach for handling multimodal data for
example with the industrial robot YuMi                  supporting learning, 2019.
interacting with a human operator to assemble       [2] M. A. Goodrich, A. C. Schultz, Human-
a box.                                                  robot interaction: a survey, Now
                                                        Publishers Inc, 2008.
                                                    [3] M. Ishii, A robot teaching method using
                                                        hyper card system, in: [1992] Proceedings
6. Conclusion                                           IEEE International Workshop on Robot
                                                        and Human Communication, 1992, pp.
    The training of psychomotor skills is               410–412.
imperative for an effective, efficient, and safe    [4] S. Borkman, A. Crespi, S. Dhakad, S.
human-robot interaction. In this paper, we              Ganguly, J. Hogins, Y.-C. Jhang, M.
propose an educational approach towards the             Kamalzadeh, B. Li, S. Leal, P. Parisi, C.
psychomotor skills training of humans in                Romero, W. Smith, A. Thaman, S.
handling industrial robots. Our educational             Warren, N. Yadav, Unity perception:
approach includes timely instructions and               Generate synthetic data for computer
feedback as well as supporting immersive                vision, 2021.
technologies. The development of such               [5] E. Spyrou, E. Mathe, G. Pikramenos, K.
technologies requires machine learning                  Kechagias,
techniques that rely on a large amount of data.         P. Mylonas, Data augmentation vs.
However, it is costly and effort intensive to           Domain adaptation—a case study in
collect data in such settings where the                 human activity recognition, Technologies
availability of the robots is limited. To address       8 (2020).
this challenge, we propose the use of data          [6] L. Weng, Domain randomization for
augmentation.                                           sim2real transfer, lilianweng.github.io/lil-
    We are going to investigate the impact of           log              (2019).             URL:
data augmentation on the performance of the             http://lilianweng.github.io/lil-
machine learning models that represent the              log/2019/05/04/domain-
interaction between the human and the robot in          randomization.html.
a physical environment. In this study, we are       [7] J. Tobin, R. Fong, A. Ray, J. Schneider, W.
going to conduct an extensive literature review         Zaremba,       P.      Abbeel,     Domain
in the domains of education for human-robot             randomization for transferring deep neural
interaction, technologies used in human-robot           networks from simulation to the real
interaction, and data augmentation. Then, a             world, 2017.
theoretical framework will be designed and an       [8] OpenAI, I. Akkaya, M. Andrychowicz, M.
immersive training prototype will be                    Chociej, M. Litwin, B. McGrew, A.
     Petron, A. Paino, M. Plappert, G. Powell,
     R. Ribas, J. Schneider, N. Tezak, J.
     Tworek, P. Welinder, L. Weng, Q. Yuan,
     W. Zaremba, L. Zhang, Solving rubik’s
     cube with a robot hand, 2019.
[9] D. Spikol, E. Ruffaldi, G. Dabisias, M.
     Cukurova, Supervised machine learning in
     multimodal learning analytics for
     estimating success in project-based
     learning, Journal of Computer Assisted
     Learning 34 (2018) 366–377.
[10] D. Di Mitri, J. Schneider, K. Trebing, S.
     Sopka, M. Specht, H. Drachsler, Real-time
     multimodal feedback with the cpr tutor, in:
     I. I. Bittencourt, M. Cukurova, K.
     Muldner, R. Luckin, E. Millán (Eds.),
     Artificial Intelligence in Education,
     Springer International Publishing, Cham,
     2020, pp. 141–152.
[11] K. A. Mat Sanusi, D. D. Mitri, B. Limbu,
     R. Klemke, Table tennis tutor: Forehand
     strokes classification based on multimodal
     data and neural networks, Sensors 21
     (2021) 3121.
[12] J. van Merriënboer, The four-component
     instructional design model: An overview
     of its main design principles. 4cid. org,
     2019.
[13] M. Molenda, In search of the elusive addie
     model, Performance improvement 42
     (2003) 34–37.
[14] M. De Villiers, P. Harpur, Design-based
     research-the     educational technology
     variant of design research: illustrated by
     the design of an m-learning environment,
     in: proceedings of the South African
     institute for computer scientists and
     information technologists conference,
     2013, pp. 252–261.
[15] T. Amiel, T. C. Reeves, Design-based
     research and educational technology:
     Rethinking technology and the research
     agenda, Journal of educational technology
     & society 11 (2008) 29–40.