Teaching psychomotor skills using machine
                  learning for error detection

    Benjamin Paaßen1[0000−0002−3899−2450] and Miloš Kravčı́k1[0000−0003−1224−1250]

            German Research Center for Artificial Intelligence, 10559 Berlin, Germany
                        {benjamin.paassen,milos.kravcik}@dfki.de


              Abstract. Learning psychomotor skills is challenging because motion
              is fast, relies strongly on subconscious mechanisms, and instruction typ-
              ically disrupts the activity. As such, learners would profit from mech-
              anisms that can react swiftly, raise subtle mistakes to the conscious
              level, and do not disrupt the activity. In this paper, we sketch a ma-
              chine learning-supported approach to provide feedback in two example
              scenarios: running, and interacting with a robot. For the running case,
              we provide an evaluation how motions can be compared to highlight
              deviations between student and expert motion.

              Keywords: Psychomotor skills · Running · Human robot interaction ·
              Dynamic movement primitives


   1       Introduction

   Teaching beneficial psychomotor skills – such as moving healthily or interacting
   with a robot companion – is challenging [3,6], because coaches need to infer
   mistakes from subtle clues in observable behavior, build a hypothesis regarding
   the underlying cause of the mistake, and verbalize an instruction that enables the
   learner to improve performance, even though the learner may not be conscious
   of the mistake or how to correct it [6]. Automatic mechanisms may be helpful
   to support instruction in such cases. An automatic feedback mechanism can
   perceive and analyze psychomotor activity with only split-second delay and,
   thus, provide feedback in almost real-time [3]. This permits learners to improve
   their psychomotor performance in a much faster loop: if they receive feedback,
   they can adapt during the activity and receive additional feedback immediately,
   whereas classic coaching would require to interrupt an activity, then receive
   verbal feedback, discuss it, and re-start the activity to check the improvement.
       In this paper, we sketch a machine-learning-supported approach for feedback
   for two psychomotor activity scenarios, namely running and interacting with a
   robot. Our approach is intended as an inspiration for a general template that can
   be applied across a wide range of psychomotor skills. For the case of running,
   we provide a first analysis using dynamic movement primitives which shows that
   we can abstract from irrelevant variations in psychomotor data and hone in on
   deviations from expert demonstrations that may be indicative of mistakes.


Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
2      Paaßen and Kravčı́k

2   Background and Related Work

To avoid injuries in running, a special wearable assistant was created, using
an electrical muscle stimulation (EMS) device and an insole with force sensing
resistors [2]. The results of the conducted study showed that EMS actuation
significantly outperforms traditional coaching, which implies that this type of
feedback can be beneficial for the motor learning of complex, repetitive move-
ments.
     Training of new skills by means of wearable technologies and augmented re-
ality is supported by a conceptual reference framework, which enables capturing
the expert’s performance and provides various transfer mechanisms [5]. To cap-
ture an expert’s experience with wearable sensors high-level tasks were mapped
to low-level functions (including body posture, hand/arm gestures, biosignals,
haptic feedback, and user location), which were decomposed to their associated
sensors [9].
     The Visual Inspection Tool (VIT) facilitates annotation of multimodal data
as well as the processing and exploitation for learning purposes [1]. The VIT
enables 1) triangulating multimodal data with video recordings; 2) segmenting
the multimodal data into time-intervals and adding annotations to the time-
intervals; 3) downloading the annotated dataset and using it for multimodal
data analysis. The tool is part of the Multimodal Learning Analytics Pipeline.
     To describe running motion, we rely on dynamic movement primitives (DMPs)
[4]. DMPs describe a motion as a combination of two forces: First, a dampened
spring system which counteracts any undesired disturbances over time and, sec-
ond, a time-dependent force term which is fitted to the data. More specifically,
the time dynamics of a DMP are described by the following equations.
                                                      
                      τ · v̇(t) = −α · β · x(t) + v(t) + f (t),
                      τ · ẋ(t) = v(t),                                     (1)

where x(t) models the location of a joint at time t, v(t) the velocity of the
joint at time t, τ ∈ R+ is a hyperparameter determining the period length of
the system, α ∈ R+ and β ∈ R+ are hyperparameters determining how fast
the dampened spring system counteracts disturbances, and f (t) is the forcing
term which models the specifics of our
                                     PKmotion. In DMPs,   PK this forcing term is
always a linear combination f (t) =     k=1 Ψk (t) · wk /  k=1 Ψk (t) of (learned)
coefficients wk with nonlinear basis functions Ψk . For this work, we use the
following rhythmically repeating basis functions as suggested in [4].
                                           t          
                    Ψk (t) = exp h · cos(2π · − ck ) − 1 ,                     (2)
                                             τ
where ck ∈ [0, 2π] is the phase shift of the kth basis function and h regulates the
width of each basis function.
    The main strengths of DMPs are that we can fit the coefficients wk to data
via simple linear regression, and that we can replay a motion at arbitrary speed
                                             Training of psychomotor skills      3

(by adjusting τ ) for arbitrary long times (by executing the system in Equa-
tion 1 for longer). DMPs have been particularly popular in robotics to mimic
human demonstrations [4,7,8] but, to our knowledge, have not yet been applied
to provide feedback to human trainees.


3   Methodology


The aim of our project is to support learners or trainees in developing specific
psychomotor skills by means of immersive learning environments. The expected
solutions will combine AI approaches processing multimodal data from suitable
sensors by machine learning techniques in order to analyze performance and
detect faults, and finally automatically generating individual feedback.
    We consider two application cases: running and collaboration with a robot.
From the learning perspective, they are different. In running (the concrete aim
apparently depends on the target group and the objective, which we currently
set as healthy running for a wide public) the trainee repeats (relatively simple)
rhythmic movements again and again, but the movements of various parts of
the body should be in harmony and follow certain rules, in order not to harm
the body and perform effectively. It suggests a behavioristic approach of learn-
ing, when each error should be immediately reported to the person, in order to
assign the message to the corresponding movement. So when the deviation in
comparison with an optimal blueprint overcomes a threshold, a suitable feed-
back is given. For example, when the person does not lift the feet properly, an
acoustic feedback is provided.
    On the other hand, collaboration between a human and a robot consists of
various actions on both sides, following a common aim. Here, the human actions
are typically performed by hands (e.g. in an assembly process), but in this case
the person needs to evaluate the current context and decide what to do next, e.g.
whether a micro-aim has been achieved and one can proceed with the next step.
Two types of skills are required – the ability to cooperate with the robot (e.g.
learner nudges the robot on its empty arm) and the ability to fulfill the requested
task (e.g. learner puts the lid on the box). This reminds cognitivistic learning
approaches, where formative feedback (we distinguish corrective and reinforcing
type) plays an important role, allowing trial and error. Therefore, more complex
actions need to be assessed, usually considering whether a specific micro-aim
has been achieved. Nevertheless, immediate feedback cannot be excluded either,
especially in case of dangerous operations.
   What both application scenarios have in common is the necessity of a sum-
mative feedback, which evaluates a whole (training) unit or a (work) session.
Here, different phases or sequences of actions can be analyzed, showing which
parts were managed well and where is a potential for improvements.
4         Paaßen and Kravčı́k

4      Implementation
The artificial intelligence in our project essentially has the following tasks that
are important for the learning process:
    – Modeling templates and movement patterns to guide learners: data sets are
      collected that represent expert performance in selected psychomotor pro-
      cesses. With this data, machine learning processes are trained for the use
      cases.
    – Detection of mistakes in the execution of movements of the learners com-
      pared to an optimal blueprint. Any deviation above a threshold triggers
      feedback.
    – Generating helpful feedback for learners: Detected errors must be processed
      in such a way that learners receive starting points for improving their pro-
      cesses, which they can cognitively process and implement psychomotorically.


                                       joint angles DMP coefficients
                                  30                                         deviations
                                   0
                                 −30                                    30
                                                                         0
                                       0 50 100                        −30
             α                    expert database                            0 50 100
                 β
                                                      best match


Fig. 1. An illustration of the proposed pipeline to provide feedback for the running
case.


    Let us illustrate these tasks for the running example. We first need to col-
lect expert demonstrations that cover a wide range of reasonable and healthy
styles of running. These demonstrations need to be recorded via motion capture
devices to retrieve joint angle information that abstracts from the specific body
configuration. Further, we need to abstract from running tempo and phase shift.
    During a learning episode, we record a learner’s current running, convert
it into the same representation, and compare the latest cycle of the learner’s
running to the most similar expert demonstration, resulting in a measure of
deviation. If the deviation exceeds a threshold, we provide auditive, tactile, or
visual feedback, e.g. coloring the deviating limb in a virtual avatar of the run-
ner [3]. Figure 1 displays a summary of our pipeline. To represent running, we
use rhythmic dynamic movement primitives over the joint angle representation,
which intrinsically abstracts from body shape and tempo, and is fast enough to
be applied on-the-fly for each new cycle of running.
                                                 Training of psychomotor skills        5

5                 Evaluation

We evaluate our proposed representation for the running case on a data set of 33
runs of three runners recorded in the Carnegie Mellon Motion Capture Lab1 . Our
aim in this experiment is to abstract from particularities of a specific run and
to identify the runner by comparing a run against other runs in the data base.
Each run is represented by Euler angles of the right and left femur as well as the
right and left tibia. We, then, derive a summary representation of each run by
training a rhythmic dynamic movement primitive (DMP). We used K = 12 basis
functions as this was sufficient to achieve a local minimum in the reconstruction
                                                   k
error. As hyperparameters we chose ck = 2π · K       , h = log(0.1)/(cos(2π/K) − 1),
α = 32, and β = α/4, as recommended by [4]. The period length τ for the
DMP was chosen automatically to minimize the auto-regressive error (i.e. how
similar the signal is to itself after shifting by τ frames). Finally, we normalized
against phase-shifts by permuting the basis functions such that the distance to
the first run in the data set was minimized. Figure 2 illustrates the effect of our
representation on the data. The left plot displays the raw data, the right plot
the reconstructed signal by our DMP representation after normalizing the period
length and the phase shift. Color indicates the runner. We observe that all runs
are much better aligned in the right plot and that it is easy to distinguish the
running style in blue from the running style in red and orange via its peaks at
frames 20 and 80.


                       40
    amplitude [deg]


                       20
                        0
                      −20
                      −40

                            0   50        100     0            50            100
                                  frame                          frame


Fig. 2. The raw data (left) for the rx angle of the left femur and the DMP reconstruction
(right). Runners are distinguished by hue, runs of the same runner by intensity.


    Next, we evaluate the accuracy of identifying the runner in a leave-one-out
crossvalidation across all runs. We use a κ-nearest neighbor classifier as imple-
mented in sklearn2 with the number of neighbors κ varying from one to five. To
select nearest neighbors, we use the Euclidean distance on the DMP coefficients.
As baselines, we also consider the Euclidean distance on the first 128 frames of
1
    http://mocap.cs.cmu.edu/info.php
2
    https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.
    KNeighborsClassifier.html
6       Paaßen and Kravčı́k

the raw signal (because all signals where at least 128 frames long) and dynamic
time warping as implemented in the edist package3 .


Table 1. Mean nearest neighbor classification accuracy in a leave-one-out crossvalida-
tion for varying number of neighbors κ.


                        metric       κ=1κ=2κ=3κ=4κ=5
                   Euclidean (raw) 0.67 0.73 0.55 0.58 0.55
                        DTW        0.88 0.88 0.85 0.88 0.76
                  Euclidean (DMP) 0.85 0.91 0.88 0.88 0.88


    Table 1 shows the classification accuracies. Our proposed DMP representa-
tion performs best for all κ except κ = 1, where DTW is better. Additionally, our
DMP representation is computationally more efficient because DTW requires a
quadratic complexity in the signal length which may become infeasible for long
signals. Our DMP representation is linear in the signal length.


6     Conclusion

In this paper, we sketched an approach to support learning psychomotor skills
using machine learning. In particular, we proposed a pipeline that compares a
learner’s activity to blueprints from experts and provides feedback whenever
deviations are detected that exceed a threshold. To make this approach viable,
our comparison needs to abstract from irrelevant factors such as body shape,
tempo, or phase shifts. We evaluated dynamic movement primitives for the case
of running and did achieve a representation that was abstract enough to identify
the runner from running behavior. In future work, we wish to implement our
pipeline fully for running and for a robot interaction scenario and evaluate its
effectiveness in supporting human learners.


References

1. Di Mitri, D., Schneider, J., Klemke, R., Specht, M., Drachsler, H.: Read between
   the lines: An annotation tool for multimodal data for learning. In: Proceedings of
   the 9th International Conference on Learning Analytics & Knowledge. pp. 51–60
   (2019). https://doi.org/10.1145/3303772.3303776
2. Hassan, M., Daiber, F., Wiehr, F., Kosmalla, F., Krüger, A.: Footstriker: An
   ems-based foot strike assistant for running. Proceedings of the ACM on In-
   teractive, Mobile, Wearable and Ubiquitous Technologies 1(1), 1–18 (2017).
   https://doi.org/10.1145/3053332
3
    https://pypi.org/project/edist/
                                                Training of psychomotor skills       7

3. Hülsmann, F., Frank, C., Senna, I., Ernst, M.O., Schack, T., Botsch, M.: Super-
   imposed skilled performance in a virtual mirror improves motor performance and
   cognitive representation of a full body motor action. Frontiers in Robotics and AI
   6, 43 (2019). https://doi.org/10.3389/frobt.2019.00043
4. Ijspeert, A.J., Nakanishi, J., Hoffmann, H., Pastor, P., Schaal, S.: Dynamical move-
   ment primitives: learning attractor models for motor behaviors. Neural computation
   25(2), 328–373 (2013). https://doi.org/10.1162/NECO a 00393
5. Limbu, B., Fominykh, M., Klemke, R., Specht, M., Wild, F.: Supporting training of
   expertise with wearable technologies: The WEKIT reference framework. In: Mobile
   and ubiquitous learning, pp. 157–175. Springer (2018). https://doi.org/10.1007/978-
   981-10-6144-8 10
6. Magill, R.A., Anderson, D.I.: The roles and uses of augmented feedback in motor
   skill acquisition, pp. 3–21. Routledge, New York, NY,USA (2012)
7. Nakanishi, J., Morimoto, J., Endo, G., Cheng, G., Schaal, S., Kawato, M.: Learning
   from demonstration and adaptation of biped locomotion. Robotics and Autonomous
   Systems 47(2), 79–91 (2004). https://doi.org/10.1016/j.robot.2004.03.003
8. Schaal, S., Mohajerian, P., Ijspeert, A.: Dynamics systems vs. optimal control
   — a unifying view. In: Cisek, P., Drew, T., Kalaska, J.F. (eds.) Computa-
   tional Neuroscience: Theoretical Insights into Brain Function, Progress in Brain
   Research, vol. 165, pp. 425–445. Elsevier (2007). https://doi.org/10.1016/S0079-
   6123(06)65027-9
9. Sharma, P., Klemke, R., Wild, F.: Experience capturing with wearable technol-
   ogy in the WEKIT project. In: Buchem, I., Klamma, R., Wild, F. (eds.) Per-
   spectives on Wearable Enhanced Learning (WELL), pp. 297–311. Springer (2019).
   https://doi.org/10.1007/978-3-319-64301-4 14