Examining the impact of data augmentation for psychomotor skills training in human-robot interaction Daniel Majonicaa,b, Deniz Irenb and Roland Klemkea,b a Cologne Game Lab, TH Köln, Cologne, Germany b Open Universiteit, Heerlen, The Netherlands Abstract Training psychomotor skills for human-robot interaction is generally done with a human trainer educating the human on how to handle the robot effectively and interact with it safely and efficiently. The dynamic interaction between a robot and a human requires complex machine learning algorithms to be modeled, and these algorithms rely on a large amount of data to be trained. Such data are collected by sensors when a human interacts with a robot. Consequently, the data must be annotated by an expert. Finally, with the annotated data, a psychomotor skills training model can be created to assist the training process. This is a time intensive and costly process. To ease the costs and cut down collection time, we propose the use of data augmentation. Keywords 1 Human-robot interaction, data augmentation, machine learning 1. Introduction the learning process ineffective and inefficient, usually hindering the beginner's learning progress. The project MILKI-PSY aims at Psychomotor skills constitute an essential improving the remote learning process of element of human-robot interaction. The psychomotor skills. In this study, the data will development of psychomotor skills requires be collected using the multimodal pipeline hands-on practice. In most cases, the practiced framework [1] which is used to handle skills need to be executed repetitively by the multimodal data specifically. learner in order to, for example, build muscle Human-robot interaction describes the memory and support further skill development. process of a human interacting with a robot in a Moreover, structured instructions and feedback shared physical environment. In certain cases, facilitate the learning process and allow safe the human and the robot operate in a performance of the practiced skills. Thus, an cooperative manner. To ensure a safe, efficient, educational model for psychomotor skill and effective interaction requires training of training needs to support the timely both the robot and the human counterpart. communication of the instructions and Training humans in the handling of industrial feedback and must define how these robots is usually done by a human trainer. To instructions and feedback are presented to the assist the training of psychomotor skills, we learner. The educational model also supports propose a pedagogical approach. The goal of the evaluation of the learning outcome. our pedagogical model is to facilitate a safe, Currently, doing this in a remote manner makes effective, and efficient learning environment Proceedings of the Doctoral Consortium of Sixteenth European Conference on Technology Enhanced Learning, September 20–21, 2021, Bolzano, Italy (online). EMAIL: dm@colognegamelab.de (A. 1); deniz.iren@ou.nl (A. 2); rk@colognegamelab.com (A. 3) ORCID: 0000-0003-4792-0472 (A. 1); 0000-0002-0727-3445 (A. 2); 0000-0002-9268-3229 (A. 3) © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) for the learner. Particularly, in this study, we of human and the robot while they interact. focus on a collaborative assembly task between When expensive machines like industrial robots a human operator and an industrial robot in are used, data collection becomes costly and which they cooperate as depicted later in figure effort-intensive. In such environments the 2. number of robots available for data collection In this paper, we first go over the related purposes is also a limitation. Having a limit of work in the field of human-robot interaction. one or two robots is not uncommon in data Then, we present three research questions that collection. On the other side, for most data we aim to address in this Ph.D. research. Next, collection, the limiting factor might be a we will discuss how we are going to achieve machine-operating human which naturally is this in the methodology part. Finally, we limited in the amount of data they can collect in conclude with the expected impact and a a full day. To address the issue of data discussion at the end of this paper. collection we propose data augmentation which is a family of techniques that allows us to 2. Related work synthesize realistic data. Formally, data augmentation can be defined as techniques aiming at the creation of synthetic Human-robot interaction is a field dedicated data [4] for the expansion of the size and/or the to understanding, designing, and evaluating diversity of the dataset [5]. A sub-field of data robotic systems that interact with humans. augmentation is domain randomization [6]. Interaction, by definition, requires Domain randomization can expose the machine communication between the interacting parties, learning model to many different variants of the i.e., robots and humans. The communication same problem [7][8] and therefore, train the between robots and humans may take model more robustly. Another sub-field of data completely different forms depending on the augmentation is domain adaptation, which aims distance between them. Goodrich and Schultz to mitigate the covariate shift problem given [2] categorizes human-robot communication that training and evaluation sets derive from the into two; proximate communication in which same distribution. Studies exist in the literature the communicating parties share the same space that indicate using domain adaptation can (physical or virtual) and remote communication positively impact the performance of the where those parties are apart. In this study, we machine learning model, especially in the focus on an assembly task where the human domain of human pose detection for activities operator shares the physical space with the [5]. robot. In our remote learning scenarios, we rely Using machine learning to categorize on immersive technologies where there is no complex psychomotor activity data for physical robot but the operator and the virtual educational purposes has been done before. For robot still shares the same virtual environment. example, Spikol et al. [9] used multimodal Thus, the method of communication between learning analytics to collect and provide the parties is proximate. different data about the interaction between the Training in human-robot interaction learner and the system. In cases where the gradually becomes more important for the next number of potential activity categories is generation of robot systems. Particularly, in significantly limited, such as the CPR tutor cases of remote training or additional training from Di Mitri et al. [10] and the table tennis outside of the conventional teaching methods, a tutor by Mat Sanusi et al. [11], using data user-friendly interface should be used [3] to augmentation might only marginally increase improve the learning process. This user the results and therefore might not be feasible interface should be designed differently due to the initial workload that those algorithms depending on the communication category. take. On the other hand, human-robot Immersive user interfaces such as AR and interaction is a complex task and therefore VR model the behavior of the human and robot might greatly benefit from data augmentation. agents. The development of such behavioral models heavily uses machine learning techniques that rely on a large amount of data. Such data can be collected via environmental sensors and cameras that capture the activities 3. Research questions To address this research question, we will consult experts in training humans on how to interact with industrial robots and classify the The following research questions are the common mistakes that can take place. Then we focus of this Ph.D. research. First of all, it is will explore how we can augment data in a important to know what the current status in meaningful manner to replicate common teaching human-robot interaction is and how mistakes. humans are trained to handle industrial robots. This requires an extensive review of the literature that focuses on teaching humans how 4. Methodology to use and operate robots. In this study, first we conduct a systematic 1. What are the common practices and literature review in the following fields of mistakes in human-robot interaction research: which are used in the domain of when handling industrial robots? educational technologies. In order to allocate common practices in 1 Educational human-robot interaction teaching human-robot interaction, it is crucial 2 Technologies in human-robot interaction to know what kind of instructions are given by 3 Semi-supervised learning models the trainer and what kind of feedback is 4 Data augmentation – Domain received by the trainee. Moreover, we also aim randomization to examine the effect of various training approaches (i.e., static, variable, and dynamic) The first research field is educational on the trainees learning progress. Secondly, human-robot interaction and it refers to the this study addresses current technologies that education of humans in handling industrial can support the training of psychomotor skills robots. This includes the common training and facilitate the teaching of human-robot practices as well as common mistakes the interaction. trainee makes during the interaction. In this study, we will use an industrial robot to 2. What technological support is assemble a box in cooperation with a human as achievable in educational human-robot seen in figure 2. First, the human learner will be interaction? trained on how to interact with the robot appropriately. Then, the learner will be When looking at technologies, the focus of instructed on the specifics of the assembly this research relies on what existing steps. technologies cover both robots and humans and The second research field is the technologies also what kind of machine learning used in human-robot interaction. This addresses technologies are available. In this study, we data augmentation and domain randomization will utilize immersive technologies that vary in which are both active fields of development, terms of level of intrusion. For example, the use and research papers about these topics in the of a head-mounted display to provide feedback domains of 3D pose detection [5] and object to the learner in an augmented reality setting detection [4] are released frequently in recent has a lower level of intrusion than a completely years. simulated learning environment. Thirdly, it is After the systematic literature review, the important for this research to focus on the next step will be to design a theoretical identification and classification of common framework. This framework includes the mistakes made during psychomotor skills design for an immersive training environment training with robots. specifically for psychomotor skills training. We will use the four-component instruction design 3. How can data augmentation assist the (4C/ID) [12] method to create our framework. successful replication of common In 4C/ID, the design is split up into four mistakes and how can we measure this different components. The first component is impact? the learning tasks which aim at integrating skills and show a high variability of practice. The second component is the supportive Figure 1: Design-based research (DBR) synthesis combined by DBR-models from Amiel & Reeves [15] and De Villiers & Harpur [14]. Showing the iterative research process in the domain of educational technologies information which focuses on the performance accommodate for innovative solutions for real- of non-routine aspects of learning tasks. It is life problems [15]. specified per task class and always available In this study, we use machine learning to throughout the whole learning process. The extrapolate from collected data and model the third component is the part-task practice. This dynamic human-robot interaction environment. aims to provide additional practice for The process of data collection can take many individual routines selected by either the trainer forms. Using the physical environment for data or trainee. The last component is procedural collection has positive and negative aspects. On information. Procedural information specifies the positive side, collected data naturally how to perform routine aspects of a task, for captures and expresses the task that the robot example by giving step-by-step instructions. has to perform. On the negative side, the data This procedural information is presented just in collection task has certain limitations such as, time during training and is gradually less the availability and the speed of the robot and present with the increasing expertise of the human, the limited amount of human-robot data trainee. In the case of designing a system for collection stations (in most cases one or two human-robot interaction, we will use 4C/ID to robots), and the expected tiredness of the teach the human learner different aspects of human. In order to counter these problems of handling industrial robots. 4C/ID provides a data collection, we will be exploring data framework to handle non-repetitive tasks which augmentation. include task non-specific repetitive elements. We are planning to develop multiple The psychomotor skills training in human-robot prototypes over the course of this research. The interaction has similar non-repetitive tasks first prototype will be designed specifically for which we are focusing on in this research. the human-robot interaction where both sides The overall process of this research have to cooperate in order to assemble a box methodology is design-based research as together. This prototype will use a virtual robot. illustrated in figure 1. The steps of design-based In the prototype, the human learner can interact research include analysis, design, development, with the virtual robot which is a simulated 3D implementation, evaluation, and reflection. In model visible through a camera or head- contrast to predictive research, design-based mounted display. research uses an iterative process. This means after evaluating the results, the entire process 5. Expected Impact can be iterated based on the ADDIE (i.e., analysis, design, development, implementation, evaluation) model [13]. While a generic By using data augmentation and generating ADDIE model jumps from the evaluation step synthetic data for modeling human-robot directly to the solution of the problem [14], this interaction, we expect the machine learning approach includes a reflective phase whereby model to perform equally or better than a all previous steps are examined and refined for machine learning model trained on physical the next iteration. This reflection and data alone. We also expect the data collection refinement of problems, solutions, methods, process to be faster and reusable in future and design principles systemically tries to applications. Examining the impact of data augmentation for psychomotor skills training in human-robot interaction, we hope to find a developed. This prototype will be implemented reliable and safe approach for training humans and evaluated in the training environment. how to handle industrial robots. 7. Acknowledgements This project was funded by the BMBF, the German Federal Ministry of Education and Research, under the MILKI-PSY (ger. abb. for Multimodal immersive learning with artificial intelligence for psychomotor training; https://www.milki-psy.de) Name and the grant code: 16DHB4013. 8. References [1] D. Di Mitri, J. Schneider, M. Specht, H. Drachsler, Multimodal pipeline: A generic Figure 2: Recent human-robot interaction approach for handling multimodal data for example with the industrial robot YuMi supporting learning, 2019. interacting with a human operator to assemble [2] M. A. Goodrich, A. C. Schultz, Human- a box. robot interaction: a survey, Now Publishers Inc, 2008. [3] M. Ishii, A robot teaching method using hyper card system, in: [1992] Proceedings 6. Conclusion IEEE International Workshop on Robot and Human Communication, 1992, pp. The training of psychomotor skills is 410–412. imperative for an effective, efficient, and safe [4] S. Borkman, A. Crespi, S. Dhakad, S. human-robot interaction. In this paper, we Ganguly, J. Hogins, Y.-C. Jhang, M. propose an educational approach towards the Kamalzadeh, B. Li, S. Leal, P. Parisi, C. psychomotor skills training of humans in Romero, W. Smith, A. Thaman, S. handling industrial robots. Our educational Warren, N. Yadav, Unity perception: approach includes timely instructions and Generate synthetic data for computer feedback as well as supporting immersive vision, 2021. technologies. The development of such [5] E. Spyrou, E. Mathe, G. Pikramenos, K. technologies requires machine learning Kechagias, techniques that rely on a large amount of data. P. Mylonas, Data augmentation vs. However, it is costly and effort intensive to Domain adaptation—a case study in collect data in such settings where the human activity recognition, Technologies availability of the robots is limited. To address 8 (2020). this challenge, we propose the use of data [6] L. Weng, Domain randomization for augmentation. sim2real transfer, lilianweng.github.io/lil- We are going to investigate the impact of log (2019). URL: data augmentation on the performance of the http://lilianweng.github.io/lil- machine learning models that represent the log/2019/05/04/domain- interaction between the human and the robot in randomization.html. a physical environment. In this study, we are [7] J. Tobin, R. Fong, A. Ray, J. Schneider, W. going to conduct an extensive literature review Zaremba, P. Abbeel, Domain in the domains of education for human-robot randomization for transferring deep neural interaction, technologies used in human-robot networks from simulation to the real interaction, and data augmentation. Then, a world, 2017. theoretical framework will be designed and an [8] OpenAI, I. Akkaya, M. Andrychowicz, M. immersive training prototype will be Chociej, M. Litwin, B. McGrew, A. Petron, A. Paino, M. Plappert, G. Powell, R. Ribas, J. Schneider, N. Tezak, J. Tworek, P. Welinder, L. Weng, Q. Yuan, W. Zaremba, L. Zhang, Solving rubik’s cube with a robot hand, 2019. [9] D. Spikol, E. Ruffaldi, G. Dabisias, M. Cukurova, Supervised machine learning in multimodal learning analytics for estimating success in project-based learning, Journal of Computer Assisted Learning 34 (2018) 366–377. [10] D. Di Mitri, J. Schneider, K. Trebing, S. Sopka, M. Specht, H. Drachsler, Real-time multimodal feedback with the cpr tutor, in: I. I. Bittencourt, M. Cukurova, K. Muldner, R. Luckin, E. Millán (Eds.), Artificial Intelligence in Education, Springer International Publishing, Cham, 2020, pp. 141–152. [11] K. A. Mat Sanusi, D. D. Mitri, B. Limbu, R. Klemke, Table tennis tutor: Forehand strokes classification based on multimodal data and neural networks, Sensors 21 (2021) 3121. [12] J. van Merriënboer, The four-component instructional design model: An overview of its main design principles. 4cid. org, 2019. [13] M. Molenda, In search of the elusive addie model, Performance improvement 42 (2003) 34–37. [14] M. De Villiers, P. Harpur, Design-based research-the educational technology variant of design research: illustrated by the design of an m-learning environment, in: proceedings of the South African institute for computer scientists and information technologists conference, 2013, pp. 252–261. [15] T. Amiel, T. C. Reeves, Design-based research and educational technology: Rethinking technology and the research agenda, Journal of educational technology & society 11 (2008) 29–40.