=Paper= {{Paper |id=Vol-1953/healthRecSys17_paper_9 |storemode=property |title=Running with Recommendation |pdfUrl=https://ceur-ws.org/Vol-1953/healthRecSys17_paper_9.pdf |volume=Vol-1953 |authors=Jakim Berndsen,Aonghus Lawlor,Barry Smyth |dblpUrl=https://dblp.org/rec/conf/recsys/BerndsenLS17 }} ==Running with Recommendation== https://ceur-ws.org/Vol-1953/healthRecSys17_paper_9.pdf

Running with Recommendation
Jakim Berndsen Aonghus Lawlor Barry Smyth
Insight Centre for Data Analytics, Insight Centre for Data Analytics, Insight Centre for Data Analytics,
University College Dublin, Belfield, University College Dublin, Belfield, University College Dublin, Belfield,
Dublin 4 Dublin 4 Dublin 4
jakim.berndsen@insight-centre.org aonghus.lawlor@insight-centre.org barry.smyth@insight-centre.org

ABSTRACT those simply trying to improve themselves and maximise their
We examine the feasibility of a collaborative recommender system personal performance. There also exist many conflicting training
in the exercise domain targeted specifically at runners. By using methodologies and thus despite extensive research an unsuitable
a large dataset of over 600000 runners’ finish times we explore training programme may be selected by a runner.
the contrasts between casual and elite runners and hypothesise In an attempt to counteract these problems we propose a novel
how a recommender system may be used to mitigate some of these collaborative recommender system. Our system is designed to rec-
differences. We also briefly discuss some of the challenges faced ommend training plans and race strategies to a runner and thus
by such a recommendation task and suggest how these challenges alleviate the need for a coach and reduce the requirement for ex-
could be addressed. tensive research to devise a personalised training plan best suited
to the runner. By mining the data on training plans of runners with
CCS CONCEPTS similar histories we can suggest a training plan to the user that
they can follow with little cognitive effort and that will allow them
• Information systems → Recommender systems; • Human-
to make significant gains over a period of time.
centered computing → Social recommendation;
In this paper, we show that elite runners progress faster than
casual runners and we surmise that this is due to the fact they have
KEYWORDS
access to coaches, research, and have a tendency to approach train-
Sports Analytics, Explanations ing in a more strategic manner. Imparting the expertise that elite
ACM Reference format: runners have garnered to a casual runner through the use of a rec-
Jakim Berndsen, Aonghus Lawlor, and Barry Smyth. 2017. Running with ommender system will lead to casual runners exhibiting increased
Recommendation. In Proceedings of Second International Workshop on Health rates of improvement, similar to that of the elite runner. We will also
Recommender Systems co-located with ACM RecSys 2017, Como, Italy, August highlight the suitability of a collaborative approach to recommen-
2017 (RecSys’17), 4 pages. dation, outline the time scales involved in such recommendation
and briefly suggest some approaches towards recommendation of
1 INTRODUCTION material in this domain. While the work presented in this paper fo-
Running is one of the most popular forms of exercise on the planet. cuses on the marathon distance, the results are generalisable across
In 2015 over 17 million race finishers were recorded in the United different race distances, and as such running as a whole.
States alone 1 . The proliferation of online resources promoting run-
ning shows the growing popularity of the sport and is a persuasive
demonstration of the participants’ interest in improving their per- 2 RELATED WORK
formance, and the desire to improve the performance over time, The use of machine learning for athletic performance is still in its
while avoiding injury, is a key motivation for us to address as a early phases. Much of the work previously undertaken in this field
recommender system problem [10]. has focused on the problem of prediction - the question of how fast
Despite the popularity of running and the availability of train- a runner would run a race given a previous race they have run at a
ing resources, it remains a difficult pastime. Completing a race different distance.
is the end result of weeks or months of meticulous training and One of the earliest works in this regard was undertaken by
planning. Selecting the best balance of hard work, recovery, and Peter Riegel [9] in 1981. Riegel examined world record times for
different training types remains a concern that requires an in-depth various activities, such as running and swimming, and found there
knowledge of running and human performance. While runners will to be a linear relationship between the log of the time taken and
exhibit natural performance increases simply by starting to run the log of the distance of the event. He thus fit an exponential
and by improving their fitness, it is extremely difficult to further equation of the form t = ax b to the world record times of different
optimise performance without access to a coach or extensive time running events, where t is the time taken, x is the distance and
investment in training methodologies. The availability of coaches a and b are constants. In this equation b can be considered the
is limited and usually reserved for competitive athletes, rather than slowdown coefficient, or fatigue factor, and was found to take a
value of 1.06 for running through the regression analysis. Riegel’s
1 http://www.runningusa.org/statistics
formula proves to be very effective for predicting times for races
International Workshop on Health Recommender Systems, August 2017, Como, Italy.
of distances shorter than the marathon and is also most effective
©2017. Copyright for the individual papers remains with the authors. Copying permit- for the category of elite marathon runners. However, it struggles
ted for private and academic purposes. This volume is published and copyrighted by to predict the times for casual runners in the marathon, especially
its editors.
HealthRecSys’17, August 2017, Como, Italy Jakim Berndsen, Aonghus Lawlor, and Barry Smyth

for times slower than 230 minutes (as can be seen in Figure 2). 230 3 RECOMMENDATION
minutes is significant as it is in events that take longer than this that
3.1 Dataset
the linear relationship between log time and log distance no longer
holds due to the body switching to different energy systems. The
Table 1: Dataset Description
Riegel formula underestimates times drastically for the category
of slower runners, which can have a disastrous effect on their race
Number of Runners 618318
performance if these predictions are used to inform race tactics. Num Races 8212756 13.28 (per runner)
Despite its limitations Riegel’s formula is the most commonly #Races 8212756 13.28 (per runner)
used method for predicting race times today. It is the equation Marathon Runners 522943(M: 62%) 324165(F: 38%)
used by many well known websites, most notably RunnersWorld #Marathons 852157 2.97 (per runner)
(www.runnersworld.com). Marathon Times 260±63
A further regression analysis was performed by David Cameron Mean Time Between Marathons 352 (days)
[3] in 1997. This study fit a regression through the 7 fastest times #Different Races per Runner 4.19
in each event at the time. This regression offers a slightly better fit #Mean Time Between Races 83 (days)
but suffers from the same problems as the Riegel formula. It also
suffers from an under prediction problem at times greater than 230 As mentioned in the related work section the current methods
minutes. Due to the relative simplicity of the Riegel compared to the used for marathon prediction rely on data ranging from a single
Cameron Time Equivalence Model and their highly comparative race at each distance (i.e. the Riegel Model) to a few thousand self
results the Riegel formula is often preferred to the Cameron model reported race times. In contrast, we have built a dataset scraped
by various running resources. from various sources including race results tables and websites
A study [14] in conjunction with slate.com aimed to address allowing athletes to self declare race times. This dataset contains
some of the inadequacies of these systems. A survey asked runners over 600000 runners and their entire race histories. This is the first
to report their own race times with the aim of building a better dataset of this scale that has been collected and allows for the first
marathon predictor. The survey responses led to a dataset of 2164 large scale data analysis and machine learning approach towards
usable responses. While again using linear regression analysis, the making predictions for a marathon time. While we do not have
model was able to utilise further information about a runner’s race complete training histories for runners, which would give greater
history. The feature set was comprised of the two previous races resolution in a machine learning problem, we attempt to approx-
run and the reported weekly training load of a runner. The results imate their training schedules by looking at the frequency of the
show that such a model outperforms the Riegel model, especially for races of various non-marathon distances the athlete has run and
more casual runners. However, the model still has limitations. The the improvements they make as they run them. Additional features,
dataset size is relatively small and thus it is difficult to utilise more such as age and gender, allow us to further distinguish between
advanced machine learning techniques that require large amounts runners in a way that has been neglected by many of the previous
of data. The prediction improvement seen by using a second race prediction models mentioned in Section 2. These features, partic-
suggests that additional information about a runner’s history is ularly gender, have significant effects when it comes to distance
beneficial in making predictions, yet is in itself limited as it does not running [1, 5].
take account that a runner may have run many hundreds of races
previously. Despite these limitations, the decreases in prediction 3.2 Runner Improvement
error have seen this model adopted by many runner’s resource
websites, including RunnersWorld. The model’s results show the
promise of what data analysis and machine learning techniques
can achieve in this domain and that further strides are possible.
Moving away from prediction into the field of recommendation,
Smyth [12][13] proposes a case-based recommendation system for
recommending a personal best marathon time. Using information
of a runner’s previous marathon history and a case base of other
runner’s times Smyth recommends not only a time that runner
should be capable of, but also a race plan to achieve the time taking
into account the difficulty and terrain of the course on which the
runner will compete. The recommendation made is important, as
it should be difficult enough to feel tested but not be so fast that
a runner ends up hitting the wall. By testing against the actual Figure 1: Average percentage change in finish time of run-
personal best run by a runner the study found that the system was ners from one race to the next as exhibited by runners in
able to predict the personal best time of a runner to within 5% the dataset
accuracy and generate race plans that are more than 90% similar
to the actual personal best split times run. These results show Figure 1 depicts how a runner improves from one marathon
that there is scope for making recommendations in the domain of race to the next. Typically a runner will see a large and steady
distance running. improvement for the first 3-4 races they run, before witnessing a
Running with Recommendation HealthRecSys’17, August 2017, Como, Italy

have a data sparsity issue. However, for the purposes of building
a running recommender this is not a problem. The elite runners,
with whom our model struggles, tend to be professional athletes
or passionate runners. These runners tend to be well coached and
well informed on their training and thus a recommender system is
expected to be of limited value to them. Runners at the very slowest
end of the spectrum tend to be one-off runners that are running
purely for fun and they are also unlikely to see any value in using
a recommender system.

3.4 Benefits of Recommender System

Figure 2: Average errors of Riegel, KNN and XGB models
over time

plateau in performance. This natural performance improvement
is a first indication of the potential benefits of a recommender
system. Most runners do not have access to advanced training
methods and lack the motivation or finances to employ a running
coach, yet they still exhibit significant improvement from one race
Figure 3: Proportion of runners running Personal Best per-
to the next. Finding suitable training plans requires a significant
formances for elite and casual runners based on time since
amount of research as there are many different methodologies
the first marathon run
runners have used in order to get results. A recommender system
designed to assist runners could act in the role of a professional
running coach and help personalise training plans based on data
mining the successful training approaches of similar runners. The
recommender system removes the time burden required to map out
an adequate training plan and helps the runner to improve at the
fastest rate.

3.3 Suitability of Collaborative Filtering
After creating a basic user profile for each runner comprising aver-
age race finish times at various non-marathon distances, we trained
Figure 4: Proportion of runners running Personal Best per-
two different models to predict marathon times. These models were
formances for elite and casual runners based on age at time
a simple K-Nearest Neighbours (KNN) model [6] and an Extreme
of personal best
Gradient Boosting (XGB) model [4]. The total percentage errors of
these models are 10.45% (KNN) and 9.04% (XGB), which compare
favourably to the error exhibited by the Riegel model of 12.8%. We make the reasonable claim that elite runners either have
Figure 2 shows how these prediction errors change for athletes access to coaching or are well-informed on training methods. As
with different race finish times. The Riegel model is indeed more ac- a result, they produce faster times but there are other phenomena
curate than KNN or XGB for elite runners (finish times < 190mins), that are a side effect of this optimised training. We define an elite
but the vast majority of runners have slower finish times than this. runner as one who has run the qualifying standard for the Boston
For these slower runners, the KNN and XGB models outperform Marathon of 190 minutes, and then we compare the performance
the Riegel model. This suggests that the use of user profiles and progression over time of elite runners and more casual runners. The
computing similarity between them is a good approach for describ- Boston Marathon qualifying time was chosen as this is considered
ing how a runner is likely perform. This provides some justification a goal time for many keen marathon runners and is a time that
that a collaborative filtering system would provide an adequate ba- requires substantial training and effort to achieve. We also tell from
sis for building a recommender system for runners. The success of Figure 2 that it corresponds roughly to the finish time at which the
Smyth’s work [12] in recommending personal best times outlined user based models begin outperforming the Riegel model which
in Section 2 also appears to corroborate this finding. makes it a natural cut off.
It could also be pointed out here that the simpler Riegel model In Figure 3 we show the point at which elite and casual runners
out-performs both the KNN and XGB models at various points in first run their overall Personal Best (PB) time since running their
the distribution. This is certainly the case for the quickest runners, first marathon. The elite runners clearly peak much earlier than
which is not surprising as the Riegel model is based on world record casual runners, with nearly 20% of elites achieving a PB in the
times and we have little data at these points from which to build a
similarity model. Similarly, for the very slowest runners, we also
HealthRecSys’17, August 2017, Como, Italy Jakim Berndsen, Aonghus Lawlor, and Barry Smyth

first year after they start marathon running. In contrast, a higher can be imparted to casual runners through the use of a recom-
fraction of casual runners achieve PB’s than elite runners, after a mender system, leading to greater levels of improvement for such
period of 5 years from their first marathon. In Figure 4 we show the runners. We have gathered an adequately large database of run-
age at which the personal best is achieved. Again, there is a strong ners and race times and through this have shown that marathon
difference between elite and casual runners, with elites much more times can be predicted by simple neighbourhood models. The better
likely to achieve the PB before the age of 40. Elite runners not only prediction accuracy of these models highlights the feasibility of a
achieve their PB’s at a younger age but also at an earlier stage of collaborative filtering approach to such recommendation. Lastly,
their running career. This finding affirms the notion that the extra we highlighted some of the issues with such a recommender system,
knowledge elite runners have over casual runners is a significant namely the time scale involved. We suggested methods as to how
advantage when it comes to making performance gains. It is clear such recommendation could be presented to runners in order to
that a recommender system would be useful in this field - an au- keep them sufficiently motivated and allow adequate time for this
tomated personalised training plan generated by a collaborative recommendation to take effect.
recommender system would mitigate the need for a professional In future we will implement such a recommender system. We ex-
coach or extensive knowledge of running training for a casual run- pect that the results found will be generalisable to other endurance
ner. Such recommendations should lead to faster performance gains sports and as such we look to expand our research into activities
from casual runners and would see casual runners maximise their such as swimming and cycling. The proliferation of wearable tech-
potential earlier. nology, such as heart rate monitors and GPS units, provides large
quantities of data from training events and races. Such data will
3.5 Methods of Recommendation provide greater resolution for a machine learning approach and we
It is important to note at this point that not even all elite runners will use such an approach to build a recommender system that can
maximise their potential quickly. Many elite runners will not run inform a user before, during, and after a training session or race.
their personal best until up to five years after their first marathon.
This demonstrates the potential time scale involved in such a recom- ACKNOWLEDGMENTS
mender system with improvements not being apparent for months This work is supported by Science Foundation Ireland through the
or even years after the first interaction with the system. Insight Centre for Data Analytics under grant number SFI/12/RC/2289
Such a recommendation system poses a unique challenge. How
does a recommender system motivate a runner to keep using a REFERENCES
system for a period of years, especially when the benefits of use [1] R Beneke, R Leithauser, and M Doppelmayr. 2005. Women will do it in the long
may not be instant? Training for a marathon is difficult and a run. British journal of sports medicine 39, 7 (2005), 410.
[2] Mustafa Bilgic and Raymond J Mooney. 2005. Explaining recommendations:
recommender may recommend a training session that, while clearly Satisfaction vs. promotion. In Beyond Personalization Workshop, IUI, Vol. 5. 153.
beneficial, may not lead to enjoyment or satisfaction for the user. [3] "David F. Cameron". 1998. (1998). "http://www.cs.uml.edu/~phoffman/cammod.
The recommendation system must therefore keep a user engaged html"
[4] Tianqi Chen and Tong He. 2015. Xgboost: extreme gradient boosting. R package
for long periods and convince them to make potentially unwanted version 0.4-2 (2015).
decisions in order for them to see benefit. [5] Robert O Deaner. 2006. More males run fast: A stable sex difference in compet-
itiveness in US distance runners. Evolution and Human Behavior 27, 1 (2006),
An important factor in achieving this goal is to provide the user 63–84.
with meaningful explanations. The ability of a system to make its [6] Daniel T Larose. 2005. k-Nearest Neighbor Algorithm. Discovering Knowledge in
reasoning transparent contributes significantly to the users accep- Data: An Introduction to Data Mining (2005), 90–106.
[7] Ting-Peng Liang, Hung-Jen Lai, and Yi-Cheng Ku. 2006. Personalized content rec-
tance of the recommendation [2] and improves their confidence ommendation and user satisfaction: Theoretical synthesis and empirical findings.
in the recommendation [11]. Various training methodologies are Journal of Management Information Systems 23, 3 (2006), 45–70.
already well documented and explained. The concept of nudging [8] Eoghan O’Shea, Sarah Jane Delany, Rob Lane, and Brian Mac Namee. 2014.
NudgeAlong: A Case Based Approach to Changing User Behaviour. In Interna-
[8], to slowly adjust the user’s behaviour, has been shown to be very tional Conference on Case-Based Reasoning. Springer, 345–359.
useful in recommender systems. The use of personalised explana- [9] Peter S Riegel. 1981. Athletic Records and Human Endurance: A time-vs.-distance
equation describing world-record performances may be used to compare the
tions can motivate a user to interact with the system and spur them relative endurance capabilities of various groups of people. American Scientist
on to do the sessions the system recommends. As demonstrated 69, 3 (1981), 285–290.
in Section 3.3 the system is capable of making predictions of the [10] Hanna Schaefer, Santiago Hors-Fraile, Raghav Pavan Karumur, André Calero
Valdez, Alan Said, Helma Torkamaan, Tom Ulmer, and Christoph Trattner. 2017.
runner’s finish time. As the recommender system nudges the user Towards Health (Aware) Recommender Systems. Proc. of DH 17 (2017).
to a particular training strategy, accurate predictions of finish times [11] Rashmi Sinha and Kirsten Swearingen. 2002. The role of transparency in rec-
and outcomes can be presented to the user to improve their moti- ommender systems. In CHI’02 extended abstracts on Human factors in computing
systems. ACM, 830–831.
vation and engagement with the system. For runners to gain the [12] Barry Smyth. 2017. A Novel Recommender System for helping Marathons to
maximum benefit from such a system it must be persuasive, easy to Achieve a new Personal-Best. Proceedings of ACM RecSys 2017, Lake Como, Italy,
August 2017 (2017).
follow, and provide motivation so the recommender is engaging for [13] Barry Smyth and Padraig Cunningham. 2017. ’Running with Cases: A CBR
long enough to have an effect on a runner’s training and change Approach to Running Your Best Marathon’. Proceedings of ICCBR 2017, Trondheim,
their behaviour [7]. Norway, June 2017 (2017).
[14] Andrew J Vickers and Emily A Vertosick. 2016. An empirical study of race times in
recreational endurance runners. BMC Sports Science, Medicine and Rehabilitation
4 CONCLUSION 8, 1 (2016), 26.
In this paper we have examined the opportunity for a recommender
system for runners. Elite runners improve at a faster rate through
knowledge gained from coaching and research. This knowledge