Insights from two Studies on AI-based Learning in Strength Training Bastian Dänekas1,* , Tanja Döring1 , Tjorven Schnack1 , Georg Volkmar1 , Robert Porzel1 and Rainer Malaka1 1 University of Bremen, Bibliotheksstraße 1, Bremen, 28359 Bremen, Germany Abstract AI-based exercise execution recognition is a current topic in computer science in sports. Through different learning algorithms systems can be build, which do give feedback about the correct and wrong execution of an athlete in regard to a specific exercise. We built two exercise execution systems in two separate studies. While one was built using supervised learning and investigating the exercise push-ups, the other was created through unsupervised learning methods for the exercise of military press. Both systems were able to detect exercise execution very well for individual persons, while correct recognition rates for the whole population of participants was worse. These two studies revealed two main challenges, which are not solely solvable in the area of AI. However, HCI researchers are be able to address those challenges and to develop future inventions. This paper opens up the design space for future HCI research in AI-based exercise execution systems, where athletes will greatly benefit from. Keywords Strength Training, Challenges of AI, Computer Science in Sports, Supervised Learning, Unsupervised Learning, Trends in HCI and Sports 1. Introduction Strength training is an essential component for a sustainable healthy life. Strength training is divided into different exercises, which can either be done with or without weights. If exercises are performed incorrectly, joint injuries can occur in the short or long term, which makes future training impossible and therefore has a negative effect on peoples health [1, 2, 3]. In order to prevent incorrect execution, it is recommended to consult a trainer about the correct execution. The trainer can then correct misalignments during the performance of an exercise and thus help to ensure correct execution. However, training lessons with a fitness coach are quite expensive and not every training session can be monitored by a coach. Furthermore, technology in the area of training is constantly evolving. Cameras can be used to record and analyze motion patterns. In addition, every commercially available smartphone now contains sensors that can measure not only the position of a person on the globe but also NTSPORT’22: New Trends in HCI and Sports Workshop at MobileHCI’22, October 1, 2022 * Corresponding author. $ daenekba@uni-bremen.de (B. Dänekas); tanja.doering@uni-bremen.de (T. Döring); tjorven.schnack@googlemail.com (T. Schnack); gvolkmar@uni-bremen.de (G. Volkmar); porzel@tzi.de (R. Porzel); malaka@tzi.de (R. Malaka)  0000-0002-4240-0761 (B. Dänekas); 0000-0001-8648-340X (T. Döring); 0000-0002-3421-7932 (T. Schnack); 0000-0001-8602-9442 (G. Volkmar); 0000-0002-7686-2921 (R. Porzel); 0000-0001-6463-4828 (R. Malaka) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) 1613-0073 CEURWorkshopProceedingshttp://ceur-ws.orgISSN Figure 1: Push-Up (left side) and military press (right side) repetitions as examples for AI-based exercise execution recognition systems. acceleration and rotation at a high frequency. Resulting data can be used to provide positive feedback on exercise performance beyond a static representation of correct exercise performance by training classifiers. However, such technical applications are often very limited by the data they are trained on. In this paper, we present insights from two studies on AI-based learning in strength training. In these studies, we evaluated two different training exercises, which can be seen in figure 1, by different classification or clustering methods and identified challenges, which will be presented in this paper. These challenges indicate future opportunities and possibilities that can be solved and explored by HCI designers. 2. Related Work The recognition of certain movement patterns has been the focus of research for quite some time. The goals of automatic motion detection are various. For example, steps can be counted to measure the activity level, which is a good indication for overall morbidity and fitness [4, 5, 6]. Another area of application would be the detection of whether an elderly person falls and an ambulance has to be called [7, 8]. Also feasible are new possibilities of controlling video games, which allow specific interactions based on certain movement patterns. This can also be transferred to HCI, since motion pattern recognition makes new natural ways of interaction possible. Another area of this recognition is sport and training. In the professional field, motion sequences of a player can be analyzed and improved in terms of efficiency and effectiveness Table 1 Overview on the two conducted studies on AI-based recognition methods in strength training Study 1 (S1) Study 2 (S2) Learning Method Classification (Sup. Learning) Clustering (Unsup. Learning) Hardware Used IMU 2D Camera Strength Exercise Push-Up Military Press # of Participants 5 18 # of Recorded Repetitions 278 909 # of Established Execution Classes 6 5 # of Trainers for Labeling 2 - (IMeasureU1 , MyoMotion2 , XSens3 ). Moreover, beginners, who are new to a sport, can learn the movement sequences better through such analyzed data. This detection is also helpful for the physical rehabilitation of patients who have to perform certain exercises correctly. The most widely used, and very usable in mobile sports settings, are motion detection algorithms using inertial measurement units (IMUs) or cameras. IMUs are a combination of three accelerometers, gyroscopes and magnetometers in three-dimensional space. These sensors are nowadays present in all kinds of mobile devices such as smartphones or smartwatches. Cameras, both 2D and 3D depth cameras, are also suitable for capturing motion data. Smartphone cameras, which have both a high resolution and a front and back camera, are especially suitable for mobile environments. Research in the domain of evaluation for AI-based exercise execution exists in both camera- based detection [9, 10, 11, 12] and IMU-based detection [13, 14, 15, 16]. Several factors come into play when weighing the hardware, which are dependent on the context. While captured camera images are easier to understand and interpret, and correct exercise execution is easier to validate, lighting and occlusion issues can occur. IMUs on the other hand are not susceptible to occlusion or lighting influences, but suffer from problems in the understandability of the captured data, can be mounted incorrectly, which has a major impact on the captured data, or suffer from long-term drift problems due to the temperature increase of the sensors. 3. Two Studies on AI-based Learning in Strength Training In order to explore AI-based recognition methods in strength training, we conducted two studies. These studies differ e.g. in their learning method, hardware used to record the data, number of participants for recording. An overview of these aspects can be seen in table 1. In Study S1, we investigated whether an IMU in a smartphone is suitable to be used to build a good classifier for push-ups. We developed a total of five types of error classes from qualitative interviews with a total of four trainers in addition to the correct execution class. These error classes included a misalignment of elbows, two misalignments of the hands in relation to the 1 https://imeasureu.com/, accessed 05.06.2020 2 https://www.velamed.com/produkte/3d-inertialsensor-system, accessed 02.06.2022 3 https://www.xsens.com/inertial-sensor-modules, accessed 09.06.2022 body, instability of the trunk during execution, and a class for other errors. After recording the data with a total of five subjects and 278 total recorded push-up repetitions, they were classified by two trainers so that supervised classification methods could be applied. Not all error classes were apparent in the recorded data set. Only the following classes remained: correct execution (KA), position of hands hands is to high in relation to the body (OS), instability of the trunk (IR), the combination of OS and IR (OSIR) and other errors (S). The data was recorded with a frequency of 103Hz as prior research suggested [16, 15]. The recorded data set was very unbalanced in the distribution of exercise executions for each participant. An example is given in table 2. Table 2 Distributions of the respective classes per participant according to the second trainer labeling each execution ∑︀ P1 P2 P3 P4 P5 KA 28 0 0 0 0 28 OS 0 0 31 0 0 31 IR 26 49 0 0 0 75 OSIR 0 0 20 0 63 83 S ∑︀ 11 0 0 50 0 61 65 49 51 50 63 278 Ten signals were recorded for the classification process: x-, y-, and z-axis of the accelerometer and gyroscope in addition to yaw, pitch, roll and magnitude of acceleration. For each of these signals, ten statistical features were calculated: arithic mean, standard deviation, skewness, kurtosis, maximum, minimum, range, first quartile and third quartile. Each individual repetition data of a push-up done in a set was segmented by hand. Five different classification algorithms were used to train the data on: Random Forests, Extra Trees, Support Vector Machine, Logistic Regression and K-Nearest Neighbors. The classifiers were evaluated using a 10-fold-cross- validation. The multi-class classifiers gave good results with an accuracy of up to 87.44% (Random Forest), while the binary classifiers achieved an accuracy of up to 90.34% (Support Vector Machine). Classifiers that did not distinguish by exercise execution but by athlete achieved an accuracy of up to 99.64% (Extra Trees). The second study S2 investigated whether clustering algorithms from unsupervised learning are suitable to recognize execution types of the strength training exercise military press. Based on existing literature, a total of five classes were collected, which were divided into the correct execution, an execution with too wide a grip, an execution with too small a range of motion (ROM), the push press execution class in which the legs are also bent and straightened, and the olympic execution class in which the spine is hyperextended to move more weight. A total of 909 repetitions of military press across 18 participants were recorded and skeletal points were calculated using OpenPose [17]. V-measure [18] is a metric that compares a clustering to the class of data points. The value is the harmonic mean of homogeneity and completeness. Homogeneity is a measure of whether a cluster has only data points of the same class associated with it. Completeness is a measure of whether all data points of a class are assigned to the same cluster. Values range from 0 to 1, with higher being better. A V-measure of 1 means that the Figure 2: Representation of the skeletal data from 19 persons with normalized (left side) and without normalized (right side) pose coordinates after a PCA with two principal components. clustering fully matches the actual assignment. While the V-Measure of individual subjects yielded very good results with an average value of 0.935, the V-Measure value for a clustering with all 18 participants resulted in only 0.024. The result after a principle component analysis with 2 main components showing the non differentiable clusters can be seen in figure 3. Both studies revealed challenges that cannot be attributed solely to an unfavorable choice of learning algorithms and methods. In the following, we briefly summarize the studies and discuss the difficulties of classifying sports exercises. 4. Challenges in Classifying Sports Exercises 4.1. No AI for All In both studies, classifiers who rated person-dependent exercise execution achieved much better results than classifiers who rated exercise execution for the entire population of participants. While this phenomenon in S1 can be explained by the unequal distribution of different exercise execution classes across participants, the same effect could also be observed for S2. In S2, a normalization procedure for camera-based systems was used to compensate for the different anatomical features such as upper-body-lower-body ratio, for example [19]. Based on the results of these two studies, it seems apparent that learning algorithms designed to evaluate and determine execution types of a strength exercise perform better when the data comes from a single individual. The reason for this observation could be the individual training level and anatomy of an athlete. The difference between incorrect and correct execution in strength training can be minimal. Even a small change in the angle of the upper arms to the floor during a push-up can be decisive for which muscles and which joints are stressed to what extent. In addition, some exercises are easier for certain groups of people than others due to their anatomy. The range of motion (ROM) of a movement varies greatly from person to person. This observation therefore additionally raises the question of whether there is a "right" or "wrong" in strength training exercises at all, or whether fine nuances do indeed depend much more on the individual person. 4.2. Inconsistency in Trainer Evaluations Two trainers labeled the 278 push-ups recorded in S1. One trainer held a B license according to the German Olympic Sports Federation (DOSB), while the other was authorized to evaluate strength training exercises through a dual study program in fitness economics. Despite this background, both trainers achieved a very low level of agreement in the classification. Only for nine repetitions both coaches fully agreed. Partial agreement could be found for 137 repetitions, while the classification completely contradicted each other for the remaining 132 repetitions. The rating occurred without the two trainers knowing of the other labeling. Due to this disagreeing labeling, only the labels of the second trainer were used. Since in many well-known studies on the classification of weight training exercises one to two trainers simultaneously label the recorded data [16, 15], it remains questionable how good these classifiers really are - assuming that other trainers would have evaluated the exercises differently. This observation does also raise the question, for what reason the inconsistency of the trainer feedback arised and if the small nuances between right and wrong in individual exercise execution are also a cause for it. 5. Opportunities for HCI These two observations, which are not unique to these two studies, suggest several implications. First, the classifiers could be optimized by AI methods and tailoring to the individual athlete. Furthermore, each learning method could ensure that a diverse distribution of different exercise executions is represented and reduce the uncertainty of the system. However, an optimization of these methods will always have issues, even if they will be represented in small numbers. This is exactly where methods from HCI can help. The uncertainty of a system, which is caused by the disagreement of trainers, could be reflected by the system as feedback. If an athlete feels restricted in movement due to an injury or, conversely, has greater mobility due to muscle length training, the system should be able to reflect this exactly. By using mobile hardware, such as the smartphone, individual virtual companions could be created through the clever use of gamification and the use of avatars, which grow with the athlete. Instead of a one-for-all solution in AI, HCI can use long-term studies to monitor and re-evaluate an athlete’s development and needs in the area of strength training. By combining key elements from AI and HCI, new ways of providing accurate and useful feedback to a strength athlete are enabled. 6. Conclusion Two studies in the application area of AI-based exercise execution recognition, revealed a total of two challenges for futures HCI researchers. The first challenge is the individualization of AI-based recognition applications. Through mobile hardware and new innovations, personalized digital exercise trainer can adapt and reflect the progress of an individual athlete. Depending on an athletes state of mobility and strength, some ways to execute an exercise are more beneficial and therefore "right" than the same execution for another athlete. The third challenge is the uncertainty of AI-based training algorithms. Even though, those algorithms will get more accurate when applying a more fitting data set and algorithm, some uncertainty and false evaluations will still remain. Instead of just confronting the user with an uncertain result, HCI researchers can develop methods on how to reflect this uncertainty in the corresponding feedback given. We showed through this paper, that the space for future collaborations for AI and HCI methods in sports is big and solutions are needed. Moreover, we hope that multiple ideas will be discussed and researched, so that future users can benefit from mobile and AI-based strength training exercise execution applications. References [1] R. W. Westermann, M. Giblin, A. Vaske, K. Grosso, B. R. Wolf, Evaluation of men’s and women’s gymnastics injuries: a 10-year observational study, Sports Health 7 (2015) 161–165. [2] P. T. Hak, E. Hodzovic, B. Hickey, The nature and prevalence of injury during crossfit training., Journal of strength and conditioning research (2013). [3] S. Kaiser, T. Engeroff, D. Niederer, H. Wurm, L. Vogt, W. Banzer, The epidemiological profile of calisthenics athletes., German Journal of Sports Medicine/Deutsche Zeitschrift fur Sportmedizin 69 (2018). [4] E. A. Bakker, Y. A. Hartman, M. T. Hopman, N. D. Hopkins, L. E. Graves, D. W. Dunstan, G. N. Healy, T. M. Eijsvogels, D. H. Thijssen, Validity and reliability of subjective methods to assess sedentary behaviour in adults: a systematic review and meta-analysis, International Journal of Behavioral Nutrition and Physical Activity 17 (2020) 1–31. [5] L. Laranjo, D. Ding, B. Heleno, B. Kocaballi, J. C. Quiroz, H. L. Tong, B. Chahwan, A. L. Neves, E. Gabarron, K. P. Dao, et al., Do smartphone applications and activity trackers increase physical activity in adults? systematic review, meta-analysis and metaregression, British journal of sports medicine 55 (2021) 422–432. [6] A. Brajdic, R. Harle, Walk detection and step counting on unconstrained smartphones, in: Proceedings of the 2013 ACM international joint conference on Pervasive and ubiquitous computing, 2013, pp. 225–234. [7] A. Bourke, J. O’brien, G. Lyons, Evaluation of a threshold-based tri-axial accelerometer fall detection algorithm, Gait & posture 26 (2007) 194–199. [8] M. Kangas, A. Konttila, P. Lindgren, I. Winblad, T. Jämsä, Comparison of low-complexity fall detection algorithms for body attached accelerometers, Gait & posture 28 (2008) 285–291. [9] E. Velloso, A. Bulling, H. Gellersen, W. Ugulino, H. Fuks, Qualitative activity recognition of weight lifting exercises, in: Proceedings of the 4th Augmented Human International Conference, 2013, pp. 116–123. [10] S. Asteriadis, A. Chatzitofis, D. Zarpalas, D. S. Alexiadis, P. Daras, Estimating human motion from multiple kinect sensors, in: Proceedings of the 6th international conference on computer vision/computer graphics collaboration techniques and applications, 2013, pp. 1–6. [11] A. Kitsikidis, K. Dimitropoulos, S. Douka, N. Grammalidis, Dance analysis using multi- ple kinect sensors, in: 2014 International Conference on Computer Vision Theory and Applications (VISAPP), volume 2, IEEE, 2014, pp. 789–795. [12] S. Kaenchan, P. Mongkolnam, B. Watanapa, S. Sathienpong, Automatic multiple kinect cameras setting for simple walking posture analysis, in: 2013 international computer science and engineering conference (ICSEC), IEEE, 2013, pp. 245–249. [13] A. Yurtman, B. Barshan, Detection and evaluation of physical therapy exercises by dynamic time warping using wearable motion sensor units, in: Information Sciences and Systems 2013, Springer, 2013, pp. 305–314. [14] O. Giggins, D. Kelly, B. Caulfield, Evaluating rehabilitation exercise performance using a single inertial measurement unit, in: 2013 7th International Conference on Pervasive Computing Technologies for Healthcare and Workshops, IEEE, 2013, pp. 49–56. [15] O. M. Giggins, K. T. Sweeney, B. Caulfield, Rehabilitation exercise assessment using inertial sensors: a cross-sectional analytical study, Journal of neuroengineering and rehabilitation 11 (2014) 158. [16] D. Whelan, M. O’Reilly, T. Ward, E. Delahunt, B. Caulfield, Evaluating performance of the single leg squat exercise with a single inertial measurement unit, in: Proceedings of the 3rd 2015 Workshop on ICTs for improving Patients Rehabilitation Research Techniques, 2015, pp. 144–147. [17] Z. Cao, T. Simon, S.-E. Wei, Y. Sheikh, Realtime multi-person 2d pose estimation using part affinity fields, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 7291–7299. [18] A. Rosenberg, J. Hirschberg, V-measure: A conditional entropy-based external cluster evaluation measure, in: Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP- CoNLL), 2007, pp. 410–420. [19] S. Chen, R. R. Yang, Pose trainer: correcting exercise posture using pose estimation, arXiv preprint arXiv:2006.11718 (2020).