iCaCoT - Interactive Camera-based Coaching and Training Lucia D’Acunto Judith Redi Omar Niamut TNO TUDelft TNO Anna van Buerenplein 1, Mekelweg 4, Anna van Buerenplein 1, 2595DA, The Hague, 2628 CD, Delft, 2595DA, The Hague, The Netherlands The Netherlands The Netherlands lucia.dacunto@tno.nl J.A.Redi@tudelft.nl omar.niamut@tno.nl ABSTRACT enables an inherently scalable method for users to interact This paper reports on the evaluation of the concept of inter- with and navigate within a video using pan-tilt-zoom (PTZ) active camera-based coaching and training (iCaCoT), which commands. In the EU FP7 project FascinatE [1] we have im- focuses on using interactive video navigation for coaching plemented such a tiled streaming technology in an iPad appli- and training purposes. The iCaCoT concept leverages tiled cation to enable users to navigate freely through high resolu- streaming technology, which allows users to navigate freely tion video panoramas, while the application limits bandwidth through high-resolution video feeds while minimising the requirements by only sending that part of the video a user is bandwidth required, by only streaming the part of the video interested in. the user is interested in. The concept of tiled streaming looks particularly well suited iCaCoT gives a trainer the possibility to zoom in on her to training and coaching use cases. That is, using a smart- trainee while she is training and to focus on specific areas, phone or tablet, a coach would be able to zoom in on her both spatially and temporally. This concept becomes espe- trainee while she is training, focusing on specific areas, both cially useful for training activities where the exact line fol- temporally as well as spatially. We refer to this as interactive lowed by the trainee is not known beforehand (e.g. ski- camera-based coaching and training (iCaCoT). This concept ing, footballing), and thus where capturing the events using becomes especially useful for training activities where the ex- a static wide-angle camera located relatively far from action act line followed by the trainee is not known beforehand (e.g. may be more convenient than a moveable close-up camera. in skating, skiing, footballing, baseballing), and thus where We implemented the iCaCoT concept as an iPad application capturing the events using a wide-angle camera located rela- and demonstrated it with ski athletes in the popular ski lo- tively far from action may be more convenient than a move- cation of Schladming, Austria. Our experiment shows that able close-up camera. By pausing the video at key moments, iCaCoT is a viable concept for ski training activities and that trainer and trainee can focus on and discuss details of the it gives interesting insights for future research directions. performance. By placing multiple high resolution cameras Author Keywords around strategic positions, it is even possible for a trainer to interactive video navigation, adaptive streaming, tiled view a moment from different angles. The tiled streaming ap- streaming, coaching and training, quality of experience, plication facilitates this using high-accuracy synchronization experiment, experimental research. techniques, ensuring that the separate videos from all cameras are synchronized frame-accurately in the application. ACM Classification Keywords In this paper, we present the results of an evaluation of the H.5.1. Information Interfaces and Presentation: Multimedia iCaCoT concept with ski athletes, performed between Febru- Information Systems - evaluation/methodology ary and March 2014 at the popular ski location of Schlad- ming, Austria (host of the 2013 Alpine Skiing World Cham- INTRODUCTION pionship). Conducting an experiment with real users has en- With the advent of high resolution and panoramic cameras, abled us to study and evaluate the suitability of tiled stream- which are able to record in HD or higher resolutions, it be- ing as a tool for coaching and training in practice and under- comes interesting to segment content spatially. By dividing stand the key enablers for interactive camera-based coaching a video frame up into multiple tiles, where each tile contains and training. Specifically, we were interested in answering a particular area of the video, a client can choose to only re- the following research questions: ceive certain areas of a video. Such a tiled streaming solution 1. What are the relevant aspects for a camera-based coaching and training application? 2. Is iCaCoT a suitable tool for training and coaching activi- ties? As a subquestion of the second question, we were also in- terested into understanding the overall user experience when 3rd International Workshop on Interactive Content Consumption at TVX’15, June 3, 2015, Brussels, Belgium. interacting with the iCaCoT application. Copyright is held by the author/owner(s). To answer these questions we have collected and analyzed a number of metrics ranging from application features usage to Quality of Experience (QoE) parameters. RELATED WORK With recent capturing systems for high-resolution video, new types of video-based training scenarios are possible where trainers and coaches have the possibility to freely choose their viewing direction and zooming level. Different examples of such interactive region-of-interest (ROI) video streaming have already been demonstrated or deployed. Interactive ROI video streaming was explored in-depth by [6, 7]. The authors developed various methods in the context of an interactive ROI streaming system, ClassX, for online lecture viewing, se- lecting tiled streaming as the best compromise between band- width, storage, processing and device requirements. Tiled Figure 1. High-level architecture of the iCaCoT system. streaming relies on a tiling of video into independently de- codable video streams. Client devices retrieve the tiled videos corresponding to a desired ROI. A similar zoomable video • Ingestion node, which captures the raw video data and en- system was further explored by [10]. There, the focus was codes it using Motion JPEG codec; on enabling low-delay interaction with high-resolution and high-quality video, with constraints on the available band- • Processing node, which takes the input encoded in Mo- width and processing capabilities as encountered in current tion JPEG received from the Ingestion node and produces network technologies and devices. For the iCaCoT appli- viewable video files; this step includes the tiling, encoding cation, we leveraged the tiled streaming system and mech- (in H.264/AVC) and multiplexing (MPEG-TS container) of anisms as presented in [8, 12]. the content; In today’s sport training and performance analysis, nearly all • Segmenter, which produces the temporal segmentation of performances are captured on video or through other sensors. the content, i.e. the final streams, using Apple’s HLS solu- Captured footage and sensor data is then viewed by expert tion. coaches/analysts, who then manually annotate and label im- portant performance indicators to gauge performance. Re- The output of the Segmenter is subsequently distributed to the lated work in sport performance analysis ranges from reduc- different instances of the iCaCoT app via a webserver. ing annotation time [11], to computer-assisted self-training Frontend systems for sports exercise [5], extracting tactic information next to regular semantic event detection [13], leveraging vir- The frontend of the iCaCoT system has been implemented as tual reality for a better understanding of the many biomechan- an iOS app for iPad. In addition to the distinguishing func- ical, physiological, and psychological factors [4], and using tionalities of tiled streaming (pan and zoom in/out), a number on-body acceleration sensors to perform motion and flying of additional ones have been included in the implementation force analysis of ski-jumping [3]. of the iCaCoT app, to fit the purpose of coaching and train- ing. These functionalities can be broadly classified into two In this paper, we provide novel contributions by focusing on a categories: trainer’s user experience when interacting with a training ap- plication. We present the results of an initial QoE evaluation • GUI functionalities of an interactive camera-based coaching and training applica- tion based on tiled streaming, performed ”in the wild”. We – Pan: to navigate within the high resolution video further investigate important application functionalities and stream; QoS of the underlying operational live video tiling system. – Zoom in/out: to change the level of details by switch- The scale and complexity of the field trial makes these contri- ing between different resolution representations of the butions very relevant for assessing the business opportunities video stream; of the interactive video system and training application. – Draw: to draw lines as overlays on the video stream; DESCRIPTION OF THE ICACOT SYSTEM – Bookmark: to store a certain position in time (a maxi- This section details the overall architecture of the iCaCoT mum of 6 bookmarks can be stored); system, including backend, frontend and monitoring frame- – Pause/resume; to pause and resume the video stream; work (Figure 1). – Step-frame: to step through frames when the video Backend playback is paused; a trainer can use this function to For the experiment, we designed and developed a pipeline for show an athlete her exact moves and explain what to a live tiling system consisting of the following components: improve; Figure 2. iCaCoT application screenshot. After a trainer has zoomed into a specific region of interest and has paused the video, he has used the drawing functionality to highlight critical training aspects, such as posture and tracks. – Seek: to move playback to another point in time with respect to the current position; (to +3, +15, -3, -15 sec- onds); this function can be used to look for a specific point in time (e.g. a particular athlete’s movement). Figure 3. Reiteralm location for the iCaCoT experiments (Schladming, Furthermore, for the second experimental run (see the section Austria). The three cameras are represented in red and the cabin hosting “Experimental Setup”) the following additional GUI func- our equipment in green. tionalities have been added: – Enhanced draw functionality: line-based, arrow- EXPERIMENTAL SETUP based and dot-based drawing, plus the ability to To provide answers to the research questions mentioned in choose different colours; the introduction, we have conducted a number of experiments with ski trainers in the popular ski location of Schladming in – Slow motion playback functionality: plays the video Austria. This section outlines system deployment and exper- at 1/4 of the original speed. iment description for our study. Experiment monitoring System deployment Throughout the experiment, we have been monitoring app us- We chose the Reiteralm area as setting for the experiments, age, user experience, network parameters and system compo- because of it being well-suited for (semi-)pro coaching and nents’ behaviour through a monitoring framework. The mon- training purposes. Figure 3 shows an overview of the slope itoring framework comprises the following: used in our experiment (slope 3), including locations of the • Monitoring framework: to monitor network and applica- three cameras used during the experiments and the cabin host- tion usage. We have used EXPERIMonitor, a baseline ing our backend and monitoring equipment. Cameras loca- component of the FP7 project EXPERIMEDIA [9], for: tion and orientation were agreed upon with the ski trainers. System setup involved various challenging tasks, such as in- – Network monitor: data downloaded over time and stalling and connecting cameras over distances of hundreds missed video frames (collected every ms); of metres on the skiing slope, and installing cables via under- – Usage monitor: every user interaction with the iCa- ground bunkers (Figure 4). CoT app - pause/resume draw, seek, zoom in/out, pan (collected at event occurrence); Experiment description Over a period of 2 months, we have performed two experi- • Logging framework: to collect real-time info from each ment rounds. The first took place in week 8 (February 17-21) component in the architecture in Figure 1; this information and the second in week 13 (March 25-29) of 2014. Each ex- is used for debugging purposes. periment round saw the participation of 4 trainers, each test- ing the app with a group of 7-10 athletes. For each ski ath- • Questionnaire aggregator: to collect the trainers’ subjec- lete, a trainer would use the app for two key activities: (i) tive evaluations of iCaCoT. The questionnaire aggregator is watch each athlete live as he/she is coming down the slope, a part of the QuickTapSurvey tool [2], which also included and (ii) discuss with each athlete his/her performance using a questionnaire app deployable on all ipads; the informa- playback of the recorded video. Before the start of the ex- tion collected was completely anonymous. periment, trainers were briefed over the functionalities of the Table 1. Occurrences of each app functionality per minute across the two experimental rounds. p-values refer to a U-test of the data. Feature Median (# 1) Median (# 2) U p-value pause 0.66 0.49 350 0.1879 resume 0.54 0.24 353.5 0.1637 seek forward 1.28 3.10 253.5 0.4464 seek backward 0.53 1.28 240 0.3284 pan 6.53 3.88 356 0.1502 zoom 36.97 9.54 458 0.0003866 draw len 11.28% 3.09% 372 0.07035 step-frame 42.54 0 194 0.0004414 add bookmark 0 0 383 0.0151 select bookmark 0 0 310 0.4283 • User satisfaction: measures user quality of experience by asking the trainer direct questions (e.g. whether their expe- rience was good or bad) Figure 4. Impressions from system setup. • GUI usability: measures whether the trainer can interact fluently with the app. This includes two aspects: iCaCoT application, especially on those specifically designed – Ease of learning, which measures whether the trainer for training purposes (zoom, pan, bookmark, trickplay). Once intuitively learn how to use the app the experiments started, our experimenters were closely mon- itoring the execution, reminding the trainers about the avail- – Ease of interaction, which measures whether the ap- able functionalities and advising on their usage. After each plication features have been implemented in the cor- trainer concluded his training activity, he was asked to fill in rect way a questionnaire (via the questionnaire app on the iPad) about • Functional usability: measures whether the application fea- his experience with iCaCoT. tures work as they should (e.g. no major hiccups within The first experiment round was used, among others, to gain application usage) insights into the needs of the end users (the ski trainers): how • Application value: measures whether the trainer perceives they envision using the app and what features they require. that the app is useful for his/her training activities Using the information obtained in the first experiment round, we have made improvements for the second round. Improve- Based on the feedback from the first round, we could deter- ments included advanced GUI functionalities (as described in mine that a number of questions were less relevant for the the section “Description of the iCaCoT system”) and higher trainers (such as the ones on the enjoyability or friendliness resolution cameras (from the GoPro of the first round to a of the app) or for the second round (e.g. the ease of learning Blackmagic Design 4K camera of the second round). and interaction, since the trainers were already used to the app). Therefore these questions were removed in the ques- tionnaire presented at the second round. Furthermore, a few Collected data new questions have been added in order to assess the impact To evaluate our experiments, we have collected both objective of changes/improvements done to iCaCoT between the first data (network and usage monitor measured by the iCaCoT and second round. A comprehensive list of questions can be app), and subjective data (through questionnaires). These are found in Table 2. discussed in detail below. We obtained and analysed 7 questionnaires in total (4 filled in Objective data during the first round and 3 filled in during the second). Throughout the experiments, the iCaCoT app logged a num- ber of usage and network metrics from participants. Every EXPERIMENT EVALUATION minute, the app would send the data collected in the last Relevant aspects for training minute to the EXPERIMonitor. The metrics being logged Our first research question aims at investigating what the rele- included the current bitrate, the total data usage, the region vant aspects of a camera-based coaching and training systems of the video that a trainer was viewing, the app feature being are. To answer this question we have analyzed the subjec- called, and a dropped frame during playback. In our analysis, tive evaluations and the app usage. From the UsefulFeature we only considered droppedFrame and featureCall. open question, the slow motion functionality appeared to be the most popular (40% of the respondents), followed by step- Subjective data frame and draw (30% of the respondents each). This result We have used questionnaires to assess the ski trainers’ im- was expected, as the slow motion functionality was added af- pressions of the iCaCoT app. ter the first experimental round upon trainers’feedback. The questions have been divided into the following cate- Each user could open and close the app several times dur- gories: ing the same experiment. We will refer to the app usage Table 2. Overview of the items included in the questionnaires adopted in the two experimental rounds. All questions (excluding Yes/No questions) are on a 5-points scale. Question/Variable Round Scale Abbreviation Category Experience with the app 1 and 2 ACR Experience User satisfaction Expectations with respect to the app 1 and 2 Bipolar Expectations User satisfaction Enjoyment of the app 1 Agreement Enjoyment User satisfaction Excitement of the app 1 Agreement Excitement User satisfaction Endurability of the app 1 and 2 Yes/No Recommendation User satisfaction Ease of learning the app 1 and 2 Bipolar Learnability GUI usability (learning) Ease of understanding the app 1 Bipolar Understandability GUI usability (learning) Friendlyness of the app 1 Bipolar Friendliness GUI usability (interaction) Ease of use of the app 1 and 2 Bipolar Usability GUI usability (interaction) Predictability of the app during usage 1 Bipolar Predictability GUI usability (interaction) Comprehensiveness of the app functionalities 1 and 2 Bipolar Comprehensiveness Functional usability Performance improved with respect to exp 1 2 Bipolar PerfImprovement Functional usability Ability to follow athlete skiing live 1 and 2 Agreement QualityLive Functional usability Ability to find playback of athlete skiing 2 Agreement Searchability Functional usability View over the piste from the app 1 Bipolar CameraPositioning Functional usability Smoothness of video navigation 1 and 2 ACR QualityNavigation Functional usability Video quality of the app 1 and 2 ACR QualityVideo Functional usability Video quality improved with respect to exp 1 2 Bipolar QualityImprovement Functional usability Dissatisfaction with interruptions 1 and 2 Yes/No Interruptions Functional usability Satisfaction with startup delay 1 Bipolar StartupSatisfaction Functional usability Satisfaction with the latency of the live video 1 and 2 Bipolar LatencySatisfaction Functional usability Overall continuity of the video stream 1 and 2 ACR Continuity Functional usability Usefulness of the app 1 and 2 Bipolar Usefulness Application value Innovativeness of the concept 1 Bipolar Innovativeness Application value Impact on teaching ability 1 Bipolar TeachingImpact Application value Impact on students’ learning curve 1 Bipolar LearningImpact Application value Impact on teaching time 1 Bipolar TeachingTime Application value Beneficial for trainers 2 Yes/No TrainerBenefit Application value Beneficial for athletes 2 Yes/No AthleteBenefit Application value Most useful app feature 1 and 2 - UsefulFeature Open Change in the app 1 - AppChange Open Remove from the app 2 - AppRemove Open Add in the app 1 and 2 - AppAddition Open How to improve playback search 2 - SearchImprovement Open within consecutive opening and closing as a ”session” and analyze app usage parameters per session. Eventually, we recorded parameters for 26 usage sessions during the first ex- periment and 22 during the second, across all participants. From the recorded data, we have then calculated the num- ber of occurrences of each functionality per minute for each session. We wanted to verify whether these app usage statis- tics were significantly different across the two experimental rounds, possibly as a consequence of the change we made in the system (enhances GUI functionalities and higher quality camera). For this purpose, we used a non-parametric Mann- Whitney U-test, which checks whether the medians of two (non-normal) distributions are equal. Table 1 reports the me- dian values for each app functionality usage in both session, the test statistic and the significance value (p). As we can ob- serve, pan and zoom are among the features that were used most frequently per minute, with a median of 6.53 and 36.97 in the first round and 3.88 and 9.54 in the second round, re- Figure 5. Overall evaluation of iCaCoT from the questionnaires spectively. Step-frame also scored high in the first round (me- dian 42.54 times per minute). It was almost never used in the second round, probably a consequence of having introduced second round, compared to time spent in first round. This slow-motion (which the trainers used for the same purpose might be due to the enhanced drawing functionality provided of illustrating the details of a certain movement to an athlete). in the second round, but given the p-values of this U-test this Furthermore, we have also calculated the fraction of time dur- assumption needs further investigation. ing each session that a trainer spent drawing. From Table 1 we note a trend in that trainers spent less time drawing in the iCaCot for coaching and training and user experience able for both trainers and trainees. Additionally, thanks to a combination of network data, app usage data, and subjec- tive evaluation from the participants, we were able to iden- tify a number of relevant aspects that affect the experience and satisfaction of trainers with the iCaCoT concept. For ex- ample, we noticed a trend about some network parameters (dropped frames) being related to app usage and we believe that further studies should focus on exploring these relation- ships in more detail. We also observed that trickplay and draw functionalities are of paramount importance for ski trainers. Nevertheless, improvements can still be made to the func- tionalities made available from the app, especially for what concerns tracking ski athletes and visualization of different training performance at the same time. Further research in this domain should focus on these challenges. REFERENCES 1. Fascinate. http://www.fascinate-project.eu/. Accessed: 2015-03-16. 2. Quicktapsurvey.com. https://www.quicktapsurvey.com/admin/import/. Accessed: 2015-03-16. Figure 6. Cumulative dropped frames (top figure) and app interaction 3. Bachlin, M., Kusserow, M., Troster, G., and Gubelmann, H. events (bottom figure) during a session run. The green triangular points Ski jump analysis of an olympic champion with wearable represent pause events while the red square points represent resume of acceleration sensors. In Wearable Computers (ISWC), 2010 playback events. International Symposium on (Oct 2010), 1–2. 4. Bideau, B., Kulpa, R., Vignais, N., Brault, S., Multon, F., and Craig, C. Using virtual reality to analyze sports performance. Computer Graphics and Applications, IEEE 30, 2 (March With our second research question, we seek to understand 2010), 14–21. whether the iCaCoT concept is suitable for coaching and 5. Chen, H.-T., He, Y.-Z., Chou, C.-L., Lee, S.-Y., Lin, B.-S., and training activities. Figure 5 shows the scores given to iCa- Yu, J.-Y. Computer-assisted self-training system for sports CoT across both experiments. As we can see, iCaCoT scores exercise using kinects. In Multimedia and Expo Workshops high for experience, learnability, usability, quality of naviga- (ICMEW), 2013 IEEE International Conference on (July tion and, most importantly, for usefulness. This trend is also 2013), 1–4. reflected in evaluation of impact on teaching/learning abil- 6. Mavlankar, A., Agrawal, P., Pang, D., Halawa, S., Cheung, N.-M., and Girod, B. An interactive region-of-interest video ity and benefit for trainers and trainee. Furthermore all par- streaming system for online lecture viewing. In Packet Video ticipants would recommend iCaCoT to others. These results Workshop (PV), 2010 18th International, IEEE (2010), 64–71. show that ski trainers found the app very valuable, which is 7. Mavlankar, A., and Girod, B. Spatial-random-access-enabled an indication that iCaCoT is a viable concept for coaching video coding for interactive virtual pan/tilt/zoom functionality. and training. Circuits and Systems for Video Technology, IEEE Transactions on 21, 5 (2011), 577–588. On the other hand, iCaCoT scored a bit lower on the quality of 8. Niamut, O., Prins, M., van Brandenburg, R., and Havekes, A. the video and comprehensiveness of the functionalities (Fig- Spatial tiling and streaming in an immersive media delivery ure 5) and 43% of the respondents indicated that they noticed network. Adjunct Proceedings of EuroITV (2011). too many interruptions in the video feed (which is also con- 9. Phillips, S., B. M. B. M. C. S. E. V. S. Z. W. S. Linking quality firmed by a median of 17 dropped frames per minute). We of service and experience in distributed multimedia systems using prov semantics. Service Oriented System Engineering noticed that frames were dropped during certain trick play (SOSE), 2015 IEEE 9th International Symposium on. l (2015). events, such as seek or resume playback (See Figure 6. Ad- 10. Quax, P., Issaris, P., Vanmontfort, W., and Lamotte, W. ditionally, trainers have indicated (in the open question about Evaluation of distribution of panoramic video sequences in the additions to the app) that they would further benefit from a explorative television project. In Proceedings of the 22nd method for tracking athletes, comparing 2 athletes or 2 runs of international workshop on Network and Operating System Support for Digital Audio and Video, ACM (2012), 45–50. the same athlete and a timer. Further research on tiled video streaming for the use in coaching and training should focus 11. Sha, L., Lucey, P., Morgan, S., Pease, D., and Sridharan, S. Swimmer localization from a moving camera. In Digital Image on these aspects. Computing: Techniques and Applications (DICTA), 2013 International Conference on (Nov 2013), 1–8. CONCLUSIONS 12. van Brandenburg, R., Niamut, O., Prins, M., and Stokking, H. Spatial segmentation for immersive media delivery. In This paper presented the implementation and results of an Intelligence in Next Generation Networks (ICIN), 15th experiment “in the wild” with an interactive camera-based International Conference on, IEEE (2011). application for coaching and training. Although conducted 13. Zhu, G., Xu, C., Huang, Q., Rui, Y., Jiang, S., Gao, W., and on a small scale, the results of our experiment provide in- Yao, H. Event tactic analysis based on broadcast sports video. dications that this type of applications are in fact very valu- Multimedia, IEEE Transactions on 11, 1 (Jan 2009), 49–67.