=Paper=
{{Paper
|id=Vol-1953/healthRecSys17_paper_9
|storemode=property
|title=Running with Recommendation
|pdfUrl=https://ceur-ws.org/Vol-1953/healthRecSys17_paper_9.pdf
|volume=Vol-1953
|authors=Jakim Berndsen,Aonghus Lawlor,Barry Smyth
|dblpUrl=https://dblp.org/rec/conf/recsys/BerndsenLS17
}}
==Running with Recommendation==
Running with Recommendation Jakim Berndsen Aonghus Lawlor Barry Smyth Insight Centre for Data Analytics, Insight Centre for Data Analytics, Insight Centre for Data Analytics, University College Dublin, Belfield, University College Dublin, Belfield, University College Dublin, Belfield, Dublin 4 Dublin 4 Dublin 4 jakim.berndsen@insight-centre.org aonghus.lawlor@insight-centre.org barry.smyth@insight-centre.org ABSTRACT those simply trying to improve themselves and maximise their We examine the feasibility of a collaborative recommender system personal performance. There also exist many conflicting training in the exercise domain targeted specifically at runners. By using methodologies and thus despite extensive research an unsuitable a large dataset of over 600000 runners’ finish times we explore training programme may be selected by a runner. the contrasts between casual and elite runners and hypothesise In an attempt to counteract these problems we propose a novel how a recommender system may be used to mitigate some of these collaborative recommender system. Our system is designed to rec- differences. We also briefly discuss some of the challenges faced ommend training plans and race strategies to a runner and thus by such a recommendation task and suggest how these challenges alleviate the need for a coach and reduce the requirement for ex- could be addressed. tensive research to devise a personalised training plan best suited to the runner. By mining the data on training plans of runners with CCS CONCEPTS similar histories we can suggest a training plan to the user that they can follow with little cognitive effort and that will allow them • Information systems → Recommender systems; • Human- to make significant gains over a period of time. centered computing → Social recommendation; In this paper, we show that elite runners progress faster than casual runners and we surmise that this is due to the fact they have KEYWORDS access to coaches, research, and have a tendency to approach train- Sports Analytics, Explanations ing in a more strategic manner. Imparting the expertise that elite ACM Reference format: runners have garnered to a casual runner through the use of a rec- Jakim Berndsen, Aonghus Lawlor, and Barry Smyth. 2017. Running with ommender system will lead to casual runners exhibiting increased Recommendation. In Proceedings of Second International Workshop on Health rates of improvement, similar to that of the elite runner. We will also Recommender Systems co-located with ACM RecSys 2017, Como, Italy, August highlight the suitability of a collaborative approach to recommen- 2017 (RecSys’17), 4 pages. dation, outline the time scales involved in such recommendation and briefly suggest some approaches towards recommendation of 1 INTRODUCTION material in this domain. While the work presented in this paper fo- Running is one of the most popular forms of exercise on the planet. cuses on the marathon distance, the results are generalisable across In 2015 over 17 million race finishers were recorded in the United different race distances, and as such running as a whole. States alone 1 . The proliferation of online resources promoting run- ning shows the growing popularity of the sport and is a persuasive demonstration of the participants’ interest in improving their per- 2 RELATED WORK formance, and the desire to improve the performance over time, The use of machine learning for athletic performance is still in its while avoiding injury, is a key motivation for us to address as a early phases. Much of the work previously undertaken in this field recommender system problem [10]. has focused on the problem of prediction - the question of how fast Despite the popularity of running and the availability of train- a runner would run a race given a previous race they have run at a ing resources, it remains a difficult pastime. Completing a race different distance. is the end result of weeks or months of meticulous training and One of the earliest works in this regard was undertaken by planning. Selecting the best balance of hard work, recovery, and Peter Riegel [9] in 1981. Riegel examined world record times for different training types remains a concern that requires an in-depth various activities, such as running and swimming, and found there knowledge of running and human performance. While runners will to be a linear relationship between the log of the time taken and exhibit natural performance increases simply by starting to run the log of the distance of the event. He thus fit an exponential and by improving their fitness, it is extremely difficult to further equation of the form t = ax b to the world record times of different optimise performance without access to a coach or extensive time running events, where t is the time taken, x is the distance and investment in training methodologies. The availability of coaches a and b are constants. In this equation b can be considered the is limited and usually reserved for competitive athletes, rather than slowdown coefficient, or fatigue factor, and was found to take a value of 1.06 for running through the regression analysis. Riegel’s 1 http://www.runningusa.org/statistics formula proves to be very effective for predicting times for races International Workshop on Health Recommender Systems, August 2017, Como, Italy. of distances shorter than the marathon and is also most effective ©2017. Copyright for the individual papers remains with the authors. Copying permit- for the category of elite marathon runners. However, it struggles ted for private and academic purposes. This volume is published and copyrighted by to predict the times for casual runners in the marathon, especially its editors. HealthRecSys’17, August 2017, Como, Italy Jakim Berndsen, Aonghus Lawlor, and Barry Smyth for times slower than 230 minutes (as can be seen in Figure 2). 230 3 RECOMMENDATION minutes is significant as it is in events that take longer than this that 3.1 Dataset the linear relationship between log time and log distance no longer holds due to the body switching to different energy systems. The Table 1: Dataset Description Riegel formula underestimates times drastically for the category of slower runners, which can have a disastrous effect on their race Number of Runners 618318 performance if these predictions are used to inform race tactics. Num Races 8212756 13.28 (per runner) Despite its limitations Riegel’s formula is the most commonly #Races 8212756 13.28 (per runner) used method for predicting race times today. It is the equation Marathon Runners 522943(M: 62%) 324165(F: 38%) used by many well known websites, most notably RunnersWorld #Marathons 852157 2.97 (per runner) (www.runnersworld.com). Marathon Times 260±63 A further regression analysis was performed by David Cameron Mean Time Between Marathons 352 (days) [3] in 1997. This study fit a regression through the 7 fastest times #Different Races per Runner 4.19 in each event at the time. This regression offers a slightly better fit #Mean Time Between Races 83 (days) but suffers from the same problems as the Riegel formula. It also suffers from an under prediction problem at times greater than 230 As mentioned in the related work section the current methods minutes. Due to the relative simplicity of the Riegel compared to the used for marathon prediction rely on data ranging from a single Cameron Time Equivalence Model and their highly comparative race at each distance (i.e. the Riegel Model) to a few thousand self results the Riegel formula is often preferred to the Cameron model reported race times. In contrast, we have built a dataset scraped by various running resources. from various sources including race results tables and websites A study [14] in conjunction with slate.com aimed to address allowing athletes to self declare race times. This dataset contains some of the inadequacies of these systems. A survey asked runners over 600000 runners and their entire race histories. This is the first to report their own race times with the aim of building a better dataset of this scale that has been collected and allows for the first marathon predictor. The survey responses led to a dataset of 2164 large scale data analysis and machine learning approach towards usable responses. While again using linear regression analysis, the making predictions for a marathon time. While we do not have model was able to utilise further information about a runner’s race complete training histories for runners, which would give greater history. The feature set was comprised of the two previous races resolution in a machine learning problem, we attempt to approx- run and the reported weekly training load of a runner. The results imate their training schedules by looking at the frequency of the show that such a model outperforms the Riegel model, especially for races of various non-marathon distances the athlete has run and more casual runners. However, the model still has limitations. The the improvements they make as they run them. Additional features, dataset size is relatively small and thus it is difficult to utilise more such as age and gender, allow us to further distinguish between advanced machine learning techniques that require large amounts runners in a way that has been neglected by many of the previous of data. The prediction improvement seen by using a second race prediction models mentioned in Section 2. These features, partic- suggests that additional information about a runner’s history is ularly gender, have significant effects when it comes to distance beneficial in making predictions, yet is in itself limited as it does not running [1, 5]. take account that a runner may have run many hundreds of races previously. Despite these limitations, the decreases in prediction 3.2 Runner Improvement error have seen this model adopted by many runner’s resource websites, including RunnersWorld. The model’s results show the promise of what data analysis and machine learning techniques can achieve in this domain and that further strides are possible. Moving away from prediction into the field of recommendation, Smyth [12][13] proposes a case-based recommendation system for recommending a personal best marathon time. Using information of a runner’s previous marathon history and a case base of other runner’s times Smyth recommends not only a time that runner should be capable of, but also a race plan to achieve the time taking into account the difficulty and terrain of the course on which the runner will compete. The recommendation made is important, as it should be difficult enough to feel tested but not be so fast that a runner ends up hitting the wall. By testing against the actual Figure 1: Average percentage change in finish time of run- personal best run by a runner the study found that the system was ners from one race to the next as exhibited by runners in able to predict the personal best time of a runner to within 5% the dataset accuracy and generate race plans that are more than 90% similar to the actual personal best split times run. These results show Figure 1 depicts how a runner improves from one marathon that there is scope for making recommendations in the domain of race to the next. Typically a runner will see a large and steady distance running. improvement for the first 3-4 races they run, before witnessing a Running with Recommendation HealthRecSys’17, August 2017, Como, Italy have a data sparsity issue. However, for the purposes of building a running recommender this is not a problem. The elite runners, with whom our model struggles, tend to be professional athletes or passionate runners. These runners tend to be well coached and well informed on their training and thus a recommender system is expected to be of limited value to them. Runners at the very slowest end of the spectrum tend to be one-off runners that are running purely for fun and they are also unlikely to see any value in using a recommender system. 3.4 Benefits of Recommender System Figure 2: Average errors of Riegel, KNN and XGB models over time plateau in performance. This natural performance improvement is a first indication of the potential benefits of a recommender system. Most runners do not have access to advanced training methods and lack the motivation or finances to employ a running coach, yet they still exhibit significant improvement from one race Figure 3: Proportion of runners running Personal Best per- to the next. Finding suitable training plans requires a significant formances for elite and casual runners based on time since amount of research as there are many different methodologies the first marathon run runners have used in order to get results. A recommender system designed to assist runners could act in the role of a professional running coach and help personalise training plans based on data mining the successful training approaches of similar runners. The recommender system removes the time burden required to map out an adequate training plan and helps the runner to improve at the fastest rate. 3.3 Suitability of Collaborative Filtering After creating a basic user profile for each runner comprising aver- age race finish times at various non-marathon distances, we trained Figure 4: Proportion of runners running Personal Best per- two different models to predict marathon times. These models were formances for elite and casual runners based on age at time a simple K-Nearest Neighbours (KNN) model [6] and an Extreme of personal best Gradient Boosting (XGB) model [4]. The total percentage errors of these models are 10.45% (KNN) and 9.04% (XGB), which compare favourably to the error exhibited by the Riegel model of 12.8%. We make the reasonable claim that elite runners either have Figure 2 shows how these prediction errors change for athletes access to coaching or are well-informed on training methods. As with different race finish times. The Riegel model is indeed more ac- a result, they produce faster times but there are other phenomena curate than KNN or XGB for elite runners (finish times < 190mins), that are a side effect of this optimised training. We define an elite but the vast majority of runners have slower finish times than this. runner as one who has run the qualifying standard for the Boston For these slower runners, the KNN and XGB models outperform Marathon of 190 minutes, and then we compare the performance the Riegel model. This suggests that the use of user profiles and progression over time of elite runners and more casual runners. The computing similarity between them is a good approach for describ- Boston Marathon qualifying time was chosen as this is considered ing how a runner is likely perform. This provides some justification a goal time for many keen marathon runners and is a time that that a collaborative filtering system would provide an adequate ba- requires substantial training and effort to achieve. We also tell from sis for building a recommender system for runners. The success of Figure 2 that it corresponds roughly to the finish time at which the Smyth’s work [12] in recommending personal best times outlined user based models begin outperforming the Riegel model which in Section 2 also appears to corroborate this finding. makes it a natural cut off. It could also be pointed out here that the simpler Riegel model In Figure 3 we show the point at which elite and casual runners out-performs both the KNN and XGB models at various points in first run their overall Personal Best (PB) time since running their the distribution. This is certainly the case for the quickest runners, first marathon. The elite runners clearly peak much earlier than which is not surprising as the Riegel model is based on world record casual runners, with nearly 20% of elites achieving a PB in the times and we have little data at these points from which to build a similarity model. Similarly, for the very slowest runners, we also HealthRecSys’17, August 2017, Como, Italy Jakim Berndsen, Aonghus Lawlor, and Barry Smyth first year after they start marathon running. In contrast, a higher can be imparted to casual runners through the use of a recom- fraction of casual runners achieve PB’s than elite runners, after a mender system, leading to greater levels of improvement for such period of 5 years from their first marathon. In Figure 4 we show the runners. We have gathered an adequately large database of run- age at which the personal best is achieved. Again, there is a strong ners and race times and through this have shown that marathon difference between elite and casual runners, with elites much more times can be predicted by simple neighbourhood models. The better likely to achieve the PB before the age of 40. Elite runners not only prediction accuracy of these models highlights the feasibility of a achieve their PB’s at a younger age but also at an earlier stage of collaborative filtering approach to such recommendation. Lastly, their running career. This finding affirms the notion that the extra we highlighted some of the issues with such a recommender system, knowledge elite runners have over casual runners is a significant namely the time scale involved. We suggested methods as to how advantage when it comes to making performance gains. It is clear such recommendation could be presented to runners in order to that a recommender system would be useful in this field - an au- keep them sufficiently motivated and allow adequate time for this tomated personalised training plan generated by a collaborative recommendation to take effect. recommender system would mitigate the need for a professional In future we will implement such a recommender system. We ex- coach or extensive knowledge of running training for a casual run- pect that the results found will be generalisable to other endurance ner. Such recommendations should lead to faster performance gains sports and as such we look to expand our research into activities from casual runners and would see casual runners maximise their such as swimming and cycling. The proliferation of wearable tech- potential earlier. nology, such as heart rate monitors and GPS units, provides large quantities of data from training events and races. Such data will 3.5 Methods of Recommendation provide greater resolution for a machine learning approach and we It is important to note at this point that not even all elite runners will use such an approach to build a recommender system that can maximise their potential quickly. Many elite runners will not run inform a user before, during, and after a training session or race. their personal best until up to five years after their first marathon. This demonstrates the potential time scale involved in such a recom- ACKNOWLEDGMENTS mender system with improvements not being apparent for months This work is supported by Science Foundation Ireland through the or even years after the first interaction with the system. Insight Centre for Data Analytics under grant number SFI/12/RC/2289 Such a recommendation system poses a unique challenge. How does a recommender system motivate a runner to keep using a REFERENCES system for a period of years, especially when the benefits of use [1] R Beneke, R Leithauser, and M Doppelmayr. 2005. Women will do it in the long may not be instant? Training for a marathon is difficult and a run. British journal of sports medicine 39, 7 (2005), 410. [2] Mustafa Bilgic and Raymond J Mooney. 2005. Explaining recommendations: recommender may recommend a training session that, while clearly Satisfaction vs. promotion. In Beyond Personalization Workshop, IUI, Vol. 5. 153. beneficial, may not lead to enjoyment or satisfaction for the user. [3] "David F. Cameron". 1998. (1998). "http://www.cs.uml.edu/~phoffman/cammod. The recommendation system must therefore keep a user engaged html" [4] Tianqi Chen and Tong He. 2015. Xgboost: extreme gradient boosting. R package for long periods and convince them to make potentially unwanted version 0.4-2 (2015). decisions in order for them to see benefit. [5] Robert O Deaner. 2006. More males run fast: A stable sex difference in compet- itiveness in US distance runners. Evolution and Human Behavior 27, 1 (2006), An important factor in achieving this goal is to provide the user 63–84. with meaningful explanations. The ability of a system to make its [6] Daniel T Larose. 2005. k-Nearest Neighbor Algorithm. Discovering Knowledge in reasoning transparent contributes significantly to the users accep- Data: An Introduction to Data Mining (2005), 90–106. [7] Ting-Peng Liang, Hung-Jen Lai, and Yi-Cheng Ku. 2006. Personalized content rec- tance of the recommendation [2] and improves their confidence ommendation and user satisfaction: Theoretical synthesis and empirical findings. in the recommendation [11]. Various training methodologies are Journal of Management Information Systems 23, 3 (2006), 45–70. already well documented and explained. The concept of nudging [8] Eoghan O’Shea, Sarah Jane Delany, Rob Lane, and Brian Mac Namee. 2014. NudgeAlong: A Case Based Approach to Changing User Behaviour. In Interna- [8], to slowly adjust the user’s behaviour, has been shown to be very tional Conference on Case-Based Reasoning. Springer, 345–359. useful in recommender systems. The use of personalised explana- [9] Peter S Riegel. 1981. Athletic Records and Human Endurance: A time-vs.-distance equation describing world-record performances may be used to compare the tions can motivate a user to interact with the system and spur them relative endurance capabilities of various groups of people. American Scientist on to do the sessions the system recommends. As demonstrated 69, 3 (1981), 285–290. in Section 3.3 the system is capable of making predictions of the [10] Hanna Schaefer, Santiago Hors-Fraile, Raghav Pavan Karumur, André Calero Valdez, Alan Said, Helma Torkamaan, Tom Ulmer, and Christoph Trattner. 2017. runner’s finish time. As the recommender system nudges the user Towards Health (Aware) Recommender Systems. Proc. of DH 17 (2017). to a particular training strategy, accurate predictions of finish times [11] Rashmi Sinha and Kirsten Swearingen. 2002. The role of transparency in rec- and outcomes can be presented to the user to improve their moti- ommender systems. In CHI’02 extended abstracts on Human factors in computing systems. ACM, 830–831. vation and engagement with the system. For runners to gain the [12] Barry Smyth. 2017. A Novel Recommender System for helping Marathons to maximum benefit from such a system it must be persuasive, easy to Achieve a new Personal-Best. Proceedings of ACM RecSys 2017, Lake Como, Italy, August 2017 (2017). follow, and provide motivation so the recommender is engaging for [13] Barry Smyth and Padraig Cunningham. 2017. ’Running with Cases: A CBR long enough to have an effect on a runner’s training and change Approach to Running Your Best Marathon’. Proceedings of ICCBR 2017, Trondheim, their behaviour [7]. Norway, June 2017 (2017). [14] Andrew J Vickers and Emily A Vertosick. 2016. An empirical study of race times in recreational endurance runners. BMC Sports Science, Medicine and Rehabilitation 4 CONCLUSION 8, 1 (2016), 26. In this paper we have examined the opportunity for a recommender system for runners. Elite runners improve at a faster rate through knowledge gained from coaching and research. This knowledge