=Paper= {{Paper |id=Vol-2148/paper4 |storemode=property |title=Cassification of Movement Quality in A Weight-shifting Exercise |pdfUrl=https://ceur-ws.org/Vol-2148/paper04.pdf |volume=Vol-2148 |authors=Elise Klaebo Vonstad,Xiaomeng Su,Beatrix Vereijken,Jan Harald Nilsen,Kerstin Bach |dblpUrl=https://dblp.org/rec/conf/ijcai/VonstadSVNB18 }} ==Cassification of Movement Quality in A Weight-shifting Exercise== https://ceur-ws.org/Vol-2148/paper04.pdf
             Classification of Movement Quality in a Weight-Shifting Exercise

   Vonstad, Elise Klæbo1 , Su, Xiaomeng1 , Vereijken, Beatrix2 , Nilsen, Jan Harald1 , Bach, Kerstin1 ,
         1
           Norwegian University of Science and Technology, Department of Computer Science
2
  Norwegian University of Science and Technology, Department of Neuromedicine and Movement Science



                          Abstract                                 Exergames for elderly might decrease the load on the health
                                                                   care system in the coming years in two ways: by prevent-
     In exercise games, it is often possible to gain re-           ing or reducing loss of independence due to reduced physical
     wards, i.e. points, by only partly completing an              function, and by empowering elderly to effectively exercise
     intended movement, which can undermine the ef-                without having to travel to a therapist or training center for
     fect of using such games for exercise. To en-                 supervised exercise. Exergames are fun and motivating par-
     sure usability and reliability of exergames, correct          tially because they provide additional, extrinsic motivation to
     movements must be accurately identified. Aim                  complete a movement – points or score in the game. Because
     of the current study was to evaluate performance              people have differences in their body shapes and sizes, the
     of machine learning models in classifying weight-             game system needs to accept a wide variety in movements to
     shifting movements as correct or incorrect. Eleven            allow for different players to play the game. This also means
     healthy elderly (6 F) performed a stepping exer-              that in many situations, it is possible to gain points without
     cise in a correct (with weight shift) and an incor-           doing the complete exercise movement intended, or just doing
     rect (without weight shift) version. A 3D Motion              a small version of the movement, as reported in e.g. [Pasch
     Capture (3DMoCap) system calculated joint center              et al., 2009]. People quickly catch that this is possible: they
     positions (JCPs); 2270 repetitions (1133 correct)             learn how to cheat. Such incorrectly performed exercise rep-
     were recorded. Random Forest (RF), k-Nearest                  etitions undermine the effect of exergaming, as it might make
     Neighbor (k-NN) and Support Vector Machine                    the quality of the exercise performed poorer and give lower
     (SVM) classification models were built. Evalua-               gains in skill or function than could be expected if the exer-
     tion: 10fold leave-one-group-out cross validation             cise was performed correctly. Apart from being less effec-
     (CV), repeated for all persons. Results showed high           tive, this can also be dangerous as over-estimation of one’s
     accuracy and recall in all classifiers. Average ac-           own skill is related to increased fall risk in elderly [Sakurai
     curacy and recall was RF = 0.989, k-NN = 0.949,               et al., 2013]. For exergames to be effective and useful, it
     SVM = 0.958. Highest was RF on all JCPs, and                  is vital that they can accurately identify the performance of
     SVM on shoulder JCPs (both 0.996). Lowest was                 an exercise repetition as being correct or incorrect. To en-
     k-NN on ankle JCPs (0.879). This study shows that             able such classification, accurate tracking movement while
     all three models can distinguish correct and incor-           exergaming is a prerequisite. As the usability and accuracy of
     rect repetitions with high accuracy and recall, also          different measurement devices varies, finding a trade-off that
     by using selected JCPs. RF consistently outper-               gives a good enough measurement accuracy while being user
     formed the other models.                                      friendly is especially challenging. The gold standard for mo-
                                                                   tion capture accuracy, marker-based 3D Motion Capture (e.g.
1   Introduction                                                   Vicon Motion Systems Ltd) camera systems give very accu-
Exercise games, or exergames, are games played on a com-           rate measurements of body movements, but are expensive, re-
puter screen that use bodily movements as input to interact        quire a fixed (laboratory) setting and expert users. Currently,
with the game. This form of exercising is gaining popular-         the most promising alternative measurement methods are the
ity and attention from both researchers and therapists. In re-     marker-less time-of-flight (ToF)/depth camera systems such
cent years, it has been shown that doing exercises elicited by     as the Kinect v2 (Microsoft Inc), and inertial measurement
games is a more motivating and fun way of exercising than          unit (IMU) systems such as the Xsens (Xsens Technologies
conventional exercise programs, while being as effective as        B.V.). These are easy to use, portable and low-cost, but do
conventional exercise when used in cooperation with thera-         not give as accurate full-body measurements as the 3DMo-
pists [Nicholson et al., 2015], [Skjaeret et al., 2016]. This is   Cap systems, especially when measuring hands and feet [van
encouraging with respect to the increasing number of elderly       Diest et al., 2014]. ToF camera systems usually utilize a
in the population, as we might utilize exergames as a tool to      skeleton model based on the 3D cloud mapping of a person to
promote self-management of exercise in people of older age.        analyze movements, where joint center positions (JCPs) are
calculated and used in analyses. Using JCPs, it is possible to       Cap measurement systems has also increased in recent years,
represent the person being tracked with enough information           but is mostly used to identify human actions and not to assess
to identify different activities [Gaglio et al., 2015], analyze      the quality of movements. For example, ML models were
postural stability [Dehbandi et al., 2017] or use the positions      successfully used to discriminate between f.e. jumping and
as input to a video-based game [Shih et al., 2016]. The ToF          walking in a continuous stream of MoCap data [Kapsouras
based systems show promising results regarding accuracy of           and Nikolaidis, 2014]. To our knowledge, research is scarce
measuring torso/upper body movements, as their discrepancy           on automatic classification of movement quality measured us-
from a 3DMoCap system are reported to be within accept-              ing high-quality JCP data obtained from 3DMoCap systems.
able ranges [Bonnechère et al., 2014], [Matsen et al., 2016].
Still, others warn about limitations in measurements of shoul-       3     Approach
der movements when comparing to goniometers [Huber et al.,
2015].                                                               3.1    Data set
   The aim of the current study was to assess the performance        As there are no open data sets containing labelled weight-
of ML classifiers. In order to capture the participants’ full-       shifting balance exercises, we conducted a data collection
body movements as accurately as possible, we used a 3DMo-            to obtain a labelled training data set. Collection of time se-
Cap system to measure high-quality movement data to ensure           ries data was conducted November 2017 using a 10-camera,
that the classification was performed on the actual movements        100Hz, 3DMoCap system (Vicon Motion Systems Ltd). Si-
the participants performed. Furthermore, as JCPs is com-             multaneous ground reaction force (GRF) data was collected
monly used in more user-friendly measurement devices, we             using a 1000Hz force plate (Kistler Inc) embedded in the
chose to use this as input to the classification model in the cur-   floor, and digital video in sagittal view was recorded for qual-
rent study, possibly allowing insight into whether data from         ity control purposes. Reflective markers were placed accord-
ToF/depth cameras could be used as input to classification           ing to the Plug-in-Gait full-body biomechanical model, with
models in the future.                                                head and hand markers excluded. Eleven participants were
   As there are several ways to successfully classify the type       recruited from local exercise groups for elderly. There were
of movement being performed using machine learning, we               6 females and 5 males, and mean age was 69.3 years (1SD
hypothesized that it is feasible to use learning algorithms to       4.0). Participants performed two versions of a balance exer-
analyze whole-body movement patterns to classify if a de-            cise movement common in stroke rehabilitation (as seen in
tected movement was performed correctly or not. Thus, this           e.g. [Okubo et al., 2016]). Both versions had the same start-
paper aims to investigate the classification performance of          ing position (Figure 1a), with both feet placed on the force
three common classification algorithms on JCP 3DMoCap                plate. The red arrow originating at the feet of the participant
data from a weight-shifting balance task in correct or incor-        represents the 3D ground reaction force (GRF). In the “cor-
rect performances.                                                   rect” performance of the movement, the right foot was placed
                                                                     in front of the person, off the force plate, and body weight
2   Related Work                                                     was shifted over to the right foot while keeping the left foot
                                                                     in contact with the force plate (as seen in Figure 1b, where
In movement analysis, machine learning has been used                 the remaining GRF on the left foot is small), before moving
mostly on data from sensors that track persons outside of the        the right foot back to the force plate. In the “incorrect” ver-
lab, as data from e.g. inertial measurement units is chal-           sion of the movement, the same step was performed, but the
lenging to analyze with traditional methods. ML analysis             person did not shift body weight over to the right foot when
methods have been used in for example activity recognition           they took the step (as seen in figure 1c, where the GRF on the
[Mukhopadhyay, 2014], [Lara and Labrador, 2013], and in              left foot is large). This movement pattern was chosen as they
identification of falls [Aziz et al., 2012] using data from          are typical ways of performing this weight-shifting exercise
IMUs. Furthermore, IMUs have been used in classification             correctly and incorrectly, as described and demonstrated by a
of movement performance in adults [Giggins et al., 2014], al-        physical therapist experienced in stroke rehabilitation. Partic-
though in this paper it only reached medium-to-good classifi-        ipants were instructed orally on how to perform these move-
cation accuracy. In [Yurtman and Barshan, 2014] a complete           ments with and without weight shift, but were encouraged to
system of movement detection and error classification con-           move in a way that was natural to them. One repetition was
cerning movement amplitude was implemented using wired               one completion of such a movement: from the moment the
IMUs to record movement during physiotherapy exercises,              person was standing in the starting position, through taking
with good results. One study used machine learning to eval-          the step, until the person had the right foot back in the starting
uate movement quality in exercises performed by children,            position. During one trial, 10 repetitions were completed in
using smart-phone IMU sensors to measure movements and               sequence. Each round of 10 repetitions was performed three
using natural fatigue as a mechanism to produce wrong per-           times, producing a 3x10 block of repetitions to mimic a nor-
formances [Carvalho and Furtado, 2016]. Lo Presti et al [Lo          mal sequence of exercising. To reduce risk of fatigue from
Presti and La Cascia, 2016] showed a wide range of ML                repeating the same movement many times during the test ses-
methods being used on identification of human actions using          sion, test persons first performed two 3x10 block of repeti-
ToF/depth cameras, with good results, however not report-            tions in the correct version of the movement, then had a 5-
ing any studies that aimed to classify the quality of detected       minute break and completed two 3x10 blocks of the incorrect
movements. The use of ML methods on data from 3DMo-                  version. This was then repeated so that each person com-
                                                     (b) Correct performance:
                                                     with weight shift                                    (c) Incorrect performance:
(a) Start and end position                                                                                without weight shift

Figure 1: a) Shows the start and end position of the movement. b) Shows the correct performance, and c) an example of an incorrect
performance.


pleted 240 repetitions in total: 120 repetitions of each version   domain, where it is likely that a model would be trained on
of the movement. Data from 11 persons were collected, with         other people’s data than data from the current player being
one person only completing half of the test protocol. This         evaluated for correct/incorrect repetitions.
resulted in 2520 recorded repetitions.
                                                                   3.4    Classification models
3.2   Pre-processing and feature extraction
                                                                   A random forest (RF, n estimators: 10) classifier, a k-nearest
Figure 2 shows the data processing model used to analyze           neighbor (k-NN, k = 10) classifier and a support vector ma-
the data. Marker data was first quality checked in the Vicon       chine (SVM, kernel = polynomial) classifier were trained and
Nexus software, and missing position data from markers were        tested, using the SciKit-Learn library, in each iteration of the
gap-filled using the built-in algorithms. JCP time series data     train-test-split. Hyperparameters were not tuned due to the
was extracted from the Plug-in-Gait biomechanical model.           success of the initial parameter settings. Results were ob-
Some repetitions were not included due to participants doing       tained as confusion matrices, where accuracy and recall were
a different movement (e.g. loss of balance, side-stepping), or     reported. Recall was chosen as a primary outcome measure
due to partial capture of repetitions at the beginning or end of   as it is vital in this setting, aside from overall accuracy.
a trial. This resulted in JCP time series data from 2270 rep-
etitions being included for further analysis, 1133 correct and
1137 incorrect. Statistical features from each JCP time se-        4     Results
ries were computed: these included mean, median, standard          Table 1 shows average accuracy from all LOGOCV iterations
deviation, sum, variance, minimum and maximum values.              for classification of incorrect and correct repetitions by the
                                                                   three classifiers. Overall, results show that all three classifi-
                                                                   cation models achieve very high accuracy of around 95 % in
                                                                   almost all classifications. The RF and SVM models achieved
                                                                   the highest accuracies, with 99.6 % on shoulder JCPs and
                                                                   all JCPs, respectively. Lowest accuracy was reached by the
                                                                   k-NN model on data from ankle JCP, 87.9 %. Recall re-
                                                                   sults (Figure 3 & 4) showed that all three models achieved
                                                                   largely more than 90 % accuracy in both correct and incor-
                                                                   rect repetitions. Figure 3 shows recall for correct repetitions
                                                                   by all classifiers, in each of the JCP selections. RF consis-
                                                                   tently achieved >95 % recall, being the most consistent in
                                                                   the different JCP selections of the three models. Average re-
                    Figure 2: Data flow model
                                                                   call of correct repetitions was 98.9 % for RF, 94.4 % in k-NN
                                                                   and 96.0 % in SVM. The SVM model performed best of the
                                                                   three on recall of correct repetitions on data from all JCPs,
3.3   Test-train-split                                             but also had the most variable performance in the other JCP
Using the SciKit-Learn library [Pedregosa et al., 2012], the       selections. K-NN reached around 95 % on all JCP selections
data was split into training and test sets, where the Leave-       except in ankle JCPs, where it was the overall worst perform-
One-Group-Out Cross-Validation (LOGOCV) method was                 ing model of the three. Figure 4 shows recall accuracy for
used to exclude data from one person and use as the test set       incorrect repetitions by all classifiers, in each of the JCP se-
in each iteration. This is a suitable method in the exercise       lections. Again, RF is most consistent with an average of
99.0 %, while k-NN and SVM achieved 95.2 % and 95.6 %,                     5   Discussion
respectively. k-NN had the lowest recall of all models in all
JCPs for incorrect repetitions, with 85.8 % in data from an-               This paper aimed to evaluate the performance of three ML
kle JCPs. All three models had the highest recall when using               classification models in classifying correctly and incorrectly
data from all JCPs, although recall from using JCP selections,             performed repetitions of a weight-shifting exercise, using
especially shoulder JCPs, was also high.                                   JCPs measured with a 3DMoCap system. Performance of
                                                                           Random Forest, K-Nearest Neighbor and a Support Vector
           Random Forest          k-NN        SVM         Avg              Machine was evaluated. Results indicated that all three mod-
                                                                           els are able to distinguish between incorrect and correct rep-
   All          99.0 %           96.8 %      99.6 %      98.5 %            etitions with high accuracy and recall (with an average accu-
  SHO           99.6 %           96.4 %      96.2 %      97.4 %            racy of 98.9 %, 94.9 % and 95.5 %, respectively). Results
                                                                           from the current study are similar to those seen in [Gaglio
  HIP           99.2 %           96.8 %      92.1 %      96.0 %
                                                                           et al., 2015] and in [Liu et al., 2017], where novel meth-
 KNE            97.5 %           96.6 %      94.1 %      96.1 %            ods were used to classify activities using JCPs from Kinect,
 ANK            99.3 %           87.9 %      96.8 %      94.7 %            outperforming other approaches on the same data set. How-
                                                                           ever, these results are not directly comparable to results in
  Avg           98.9 %           94.9 %      95.8 %      96.5 %            the current study, as the mentioned studies are not concerned
                                                                           with movement quality but with movement type. Compared
Table 1: Accuracy of classifiers for the different joint centre posi-      to other studies on movement quality (e.g. [Giggins et al.,
tions.                                                                     2014], [Yurtman and Barshan, 2014]), which are based on
                                                                           data from IMUs, the achieved accuracy in the current study
                                                                           is higher. This is possibly an effect of the movements in
                                                                           this study being instructed, and that the movements in these
                                                                           other studies are more complex and varied. Also, the IMU
                                                                           data might not represent the movements as accurately as the
                                                                           3DMoCap data does. Using all JCPs in the classification
                                                                           reached marginally higher accuracy than using any of the JCP
                                                                           selections, as seen in Table 1. The RF model was consistently
                                                                           slightly more accurate than the other two models, for both ac-
                                                                           curacy and recall. In light of the issue of avoiding in-game
                                                                           rewards for incorrect performance, recall of incorrect repe-
                                                                           titions is a vital score here. The RF model achieved >95
                                                                           % recall in all JCP selections. The k-NN and SVM models
                                                                           also achieved high recall, but were not as consistent in JCP
                                                                           selections as the RF model. Other studies using JCPs typi-
                                                                           cally use all joints, or only joints that are tracked with good
                                                                           accuracy during the whole capture, as seen in [Gaglio et al.,
Figure 3: Recall for correct repetitions by all classifiers on all JCPs,   2015]. Therefore, the results from classification of movement
shoulder (SHO), hip (HIP), knee (KNE) and ankle (ANK) JCPs.                quality using JCP selections in the current study might not
                                                                           be comparable to results from selected JCPs in other stud-
                                                                           ies. Results also reflect that the data from incorrect and cor-
                                                                           rect repetitions were very different, as all three models ac-
                                                                           curately distinguished between them. The oral instructions
                                                                           might have contributed to this, as the instructions probably in-
                                                                           fluenced the movement patterns. Spontaneous, natural move-
                                                                           ments might be more variable than what was seen in this data
                                                                           set. Also, the correct movements were performed with more
                                                                           upper-body movement towards the stepping foot, and the heel
                                                                           of the stance foot was also lifted from the force plate. Fur-
                                                                           thermore, data from only the ankle JCPs were also classified
                                                                           with >80 % accuracy and recall by all models, which was not
                                                                           expected as both movements include similar stepping move-
                                                                           ments in the feet. The movements of the feet alone were dif-
                                                                           ferent enough in the correct and incorrect repetitions to en-
                                                                           able accurate classification, which might be a result of the
Figure 4: Recall for incorrect repetitions by all classifiers on all       aforementioned heel-lifts seen in only the correct trials. This
JCPs, shoulder (SHO), hip (HIP), knee (KNE) and ankle (ANK)                probably resulted in more variable JCP’s during correct repe-
JCPs.                                                                      titions, enabling the ML models to accurately identify them.
                                                                           Using ML-models for the purpose of evaluating movement
quality using data from ToF/depth cameras seems feasible               ing Support Vector Machines. Conference proceedings:
given the very good performance achieved here. Furthermore,            IEEE Engineering in Medicine and Biology Society. An-
the good performance achieved in this study indicates that the         nual Conference, 2012:5837–5840, 2012.
models possibly can reach acceptable accuracy and recall also       [Bonnechère et al., 2014] B. Bonnechère, B. Jansen,
with lower-quality data. This can facilitate implementation of         P. Salvia, H. Bouzahouene, L. Omelina, F. Moiseev,
ML models into more user-friendly exergaming contexts. Re-             V. Sholukha, J. Cornelis, M. Rooze, and S. Van Sint
call results in classification of both correct and incorrect rep-      Jan. Validity and reliability of the Kinect within func-
etitions are very encouraging for applying ML in analysis of           tional assessment activities: Comparison with standard
movements during exergaming, as this could make it harder              stereophotogrammetry. Gait and Posture, 39(1):593–598,
for the player to receive rewards without performing the in-           2014.
tended movement correctly. However, as the current move-
ments were not elicited by an actual exergame, it remains           [Carvalho and Furtado, 2016] L. D. Carvalho and V. Fur-
to be determined whether a similar level of accuracy can be            tado. Using machine learning for evaluating the quality
achieved in more realistic exergaming movements. Further-              of exercises in a mobile exergame for tackling obesity in
more, the high accuracy in all JCP selections suggests that it         children. Proceedings of SAI Intelligent Systems Confer-
might be feasible to use only the more accurate measurements           ence (IntelliSys), 15, 2016.
of shoulder or hip JCPs from using ToF/depth cameras, and           [Dehbandi et al., 2017] B. Dehbandi, A. Barachant, A. H
still accurately identify correct and incorrect repetitions of a       Smeragliuolo, J. D. Long, S. J. Bumanlag, V. He,
weight-shifting exercise. This could provide a way of using            A. Lampe, and D. Putrino. Using data from the Microsoft
ML in exergames to more accurately reward movements dur-               Kinect 2 to determine postural stability in healthy subjects:
ing play, thus ensuring movement quality to a greater extent           A feasibility trial. PloS one, 12(2):e0170890, 2017.
than the existing systems do. Future work will focus on the         [Gaglio et al., 2015] S. Gaglio, G. Lo Re, and M. Morana.
use of ML models in actual exergame situations, as this pos-
                                                                       Human Activity Recognition Process Using 3-D Posture
sibly elicits movements that are noisier than in the current
                                                                       Data. IEEE Transactions on Human-Machine Systems,
study, hence making the repetitions difficult to classify as be-
                                                                       45(5):586–597, 2015.
ing incorrect or correct. Using motion capture systems with
lower accuracy, and only using e.g. shoulder JCPs as input to       [Giggins et al., 2014] O. M Giggins, K. T. Sweeney, and
the classification models would also be interesting to test in         B. Caulfield. Rehabilitation exercise assessment using in-
an actual exergaming setting, to see if the movements are still        ertial sensors: a cross-sectional analytical study. Jour-
different enough to be classified as being correctly or incor-         nal of NeuroEngineering and Rehabilitation, pages 1–10,
rectly performed with similar accuracy to this study.                  2014.
                                                                    [Huber et al., 2015] M. E. Huber, A. L. Seitz, M. Leeser, and
6   Conclusion                                                         D. Sternad. Validity and reliability of Kinect skeleton for
In order to use exergames effectively as a training and reha-          measuring shoulder joint angles: A feasibility study. Phys-
bilitation tool, it is crucial that the exergame system can iden-      iotherapy (United Kingdom), 101(4):389–393, 2015.
tify correct and incorrect exercise repetitions accurately. This    [Kapsouras and Nikolaidis, 2014] I. Kapsouras and N. Niko-
paper shows that it is feasible to use ML models in the au-            laidis. Action recognition on motion capture data using a
tomatic classification of correctly and incorrectly performed          dynemes and forward difference representation. Proceed-
weight-shifts in balance exercises. Applying ML models on              ings - International Conference on Pattern Recognition,
high-quality JCP movement data from a weight-shifting ex-              25:2649–2654, 2014.
ercise yielded accurate classification of correct and incorrect     [Lara and Labrador, 2013] Oscar D. Lara and Miguel A.
exercise repetitions. Results encourage the testing of such
                                                                       Labrador. A Survey on Human Activity Recognition us-
models on JCP data obtained while elderly are playing ac-
                                                                       ing Wearable Sensors. IEEE Communications Surveys &
tual exergames, to investigate whether the models are equally
                                                                       Tutorials, 15(3):1192–1209, 2013.
accurate in a more natural and possibly noisier setting. How-
ever, this was done in a setting where the performance of rep-      [Liu et al., 2017] Jun Liu, Amir Shahroudy, Dong Xu, Alex
etitions was instructed, and the movements performed (for              Kot Chichung, and Gang Wang. Skeleton-Based Action
example the movement pattern of an incorrectly performed               Recognition Using Spatio-Temporal LSTM Network with
repetition) might differ from the movements performed here.            Trust Gates. IEEE Transactions on Pattern Analysis and
The study also shows that using only selected JCPs yields ac-          Machine Intelligence, 2017.
curate results as well, which is promising with regard to pos-      [Lo Presti and La Cascia, 2016] Liliana Lo Presti and Marco
sible use of ML models on data from data capture methods               La Cascia. 3D skeleton-based human action classification:
that are lower cost and more user friendly.                            A survey. Pattern Recognition, 53:130–147, 2016.
                                                                    [Matsen et al., 2016] F. A. Matsen, Al. Lauder, K. Rector,
References                                                             P. Keeling, and A. L. Cherones. Measurement of active
[Aziz et al., 2012] O. Aziz, E. J Park, G. Mori, and S. N              shoulder motion using the Kinect, a commercially avail-
  Robinovitch. Distinguishing near-falls from daily activ-             able infrared position detection system. Journal of Shoul-
  ities with wearable accelerometers and gyroscopes us-                der and Elbow Surgery, 25(2):216–223, 2016.
[Mukhopadhyay, 2014] S C Mukhopadhyay. Wearable sen-
   sors for human activity monitoring: A review. IEEE Sen-
   sors Journal, 15(3):1321–1330, 2014.
[Nicholson et al., 2015] V. P. Nicholson, M. McKean,
   J. Lowe, C. Fawcett, and B. Burkett. Six weeks of unsu-
   pervised Nintendo Wii Fit gaming is effective at improving
   balance in independent older adults. Journal of Aging and
   Physical Activity, 23(1):153–158, 2015.
[Okubo et al., 2016] Y. Okubo, D. Schoene, and S. R Lord.
   Step training improves reaction time, gait and balance and
   reduces falls in older people: a systematic review and
   meta-analysis. British Journal of Sports Medicine, 2016.
[Pasch et al., 2009] Marco Pasch, Nadia Bianchi-Berthouze,
   Betsy van Dijk, and Anton Nijholt. Movement-based
   sports video games: Investigating motivation and gaming
   experience. Entertainment Computing, 1(2):49–61, 2009.
[Pedregosa et al., 2012] F. Pedregosa, G. Varoquaux,
   A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blon-
   del, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas,
   A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and
   É Duchesnay. Scikit-learn: Machine Learning in Python.
   Journal of Machine Learning Research, 12:2825–2830,
   2012.
[Sakurai et al., 2013] R. Sakurai, Y. Fujiwara, M. Ishihara,
   T. Higuchi, H. Uchida, and K. Imanaka. Age-related self-
   overestimation of step-over ability in healthy older adults
   and its relationship to fall risk. BMC Geriatrics, 13(1):15–
   17, 2013.
[Shih et al., 2016] Meng Che Shih, Ray Yau Wang,
   Shih Jung Cheng, and Yea Ru Yang. Effects of a balance-
   based exergaming intervention using the Kinect sensor on
   posture stability in individuals with Parkinson’s disease:
   A single-blinded randomized controlled trial. Journal of
   NeuroEngineering and Rehabilitation, 13(1):1–9, 2016.
[Skjaeret et al., 2016] Nina Skjaeret, Ather Nawaz, Tobias
   Morat, Daniel Schoene, Jorunn Laegdheim, and Beatrix
   Vereijken. Exercise and rehabilitation delivered through
   exergames in older adults : An integrative review of tech-
   nologies, safety and efficacy. International Journal of
   Medical Informatics, 85(1):1–16, 2016.
[van Diest et al., 2014] Mike van Diest, Jan Stegenga, Hein-
   rich J. Wörtche, Klaas Postema, Gijsbertus J. Verkerke,
   and Claudine J.C. Lamoth. Suitability of Kinect for mea-
   suring whole body movement patterns during exergaming.
   Journal of Biomechanics, 47(12):2925–2932, 2014.
[Yurtman and Barshan, 2014] A. Yurtman and B. Barshan.
   Automated evaluation of physical therapy exercises using
   multi-template dynamic time warping on wearable sensor
   signals. Computer Methods and Programs in Biomedicine,
   117(2):189–207, 2014.