=Paper= {{Paper |id=Vol-1953/healthRecSys17_paper_9 |storemode=property |title=Running with Recommendation |pdfUrl=https://ceur-ws.org/Vol-1953/healthRecSys17_paper_9.pdf |volume=Vol-1953 |authors=Jakim Berndsen,Aonghus Lawlor,Barry Smyth |dblpUrl=https://dblp.org/rec/conf/recsys/BerndsenLS17 }} ==Running with Recommendation== https://ceur-ws.org/Vol-1953/healthRecSys17_paper_9.pdf
                                            Running with Recommendation
                 Jakim Berndsen                                          Aonghus Lawlor                                   Barry Smyth
      Insight Centre for Data Analytics,                       Insight Centre for Data Analytics,             Insight Centre for Data Analytics,
     University College Dublin, Belfield,                     University College Dublin, Belfield,            University College Dublin, Belfield,
                  Dublin 4                                                 Dublin 4                                       Dublin 4
     jakim.berndsen@insight-centre.org                        aonghus.lawlor@insight-centre.org                barry.smyth@insight-centre.org

ABSTRACT                                                                               those simply trying to improve themselves and maximise their
We examine the feasibility of a collaborative recommender system                       personal performance. There also exist many conflicting training
in the exercise domain targeted specifically at runners. By using                      methodologies and thus despite extensive research an unsuitable
a large dataset of over 600000 runners’ finish times we explore                        training programme may be selected by a runner.
the contrasts between casual and elite runners and hypothesise                            In an attempt to counteract these problems we propose a novel
how a recommender system may be used to mitigate some of these                         collaborative recommender system. Our system is designed to rec-
differences. We also briefly discuss some of the challenges faced                      ommend training plans and race strategies to a runner and thus
by such a recommendation task and suggest how these challenges                         alleviate the need for a coach and reduce the requirement for ex-
could be addressed.                                                                    tensive research to devise a personalised training plan best suited
                                                                                       to the runner. By mining the data on training plans of runners with
CCS CONCEPTS                                                                           similar histories we can suggest a training plan to the user that
                                                                                       they can follow with little cognitive effort and that will allow them
• Information systems → Recommender systems; • Human-
                                                                                       to make significant gains over a period of time.
centered computing → Social recommendation;
                                                                                          In this paper, we show that elite runners progress faster than
                                                                                       casual runners and we surmise that this is due to the fact they have
KEYWORDS
                                                                                       access to coaches, research, and have a tendency to approach train-
Sports Analytics, Explanations                                                         ing in a more strategic manner. Imparting the expertise that elite
ACM Reference format:                                                                  runners have garnered to a casual runner through the use of a rec-
Jakim Berndsen, Aonghus Lawlor, and Barry Smyth. 2017. Running with                    ommender system will lead to casual runners exhibiting increased
Recommendation. In Proceedings of Second International Workshop on Health              rates of improvement, similar to that of the elite runner. We will also
Recommender Systems co-located with ACM RecSys 2017, Como, Italy, August               highlight the suitability of a collaborative approach to recommen-
2017 (RecSys’17), 4 pages.                                                             dation, outline the time scales involved in such recommendation
                                                                                       and briefly suggest some approaches towards recommendation of
1    INTRODUCTION                                                                      material in this domain. While the work presented in this paper fo-
Running is one of the most popular forms of exercise on the planet.                    cuses on the marathon distance, the results are generalisable across
In 2015 over 17 million race finishers were recorded in the United                     different race distances, and as such running as a whole.
States alone 1 . The proliferation of online resources promoting run-
ning shows the growing popularity of the sport and is a persuasive
demonstration of the participants’ interest in improving their per-                    2   RELATED WORK
formance, and the desire to improve the performance over time,                         The use of machine learning for athletic performance is still in its
while avoiding injury, is a key motivation for us to address as a                      early phases. Much of the work previously undertaken in this field
recommender system problem [10].                                                       has focused on the problem of prediction - the question of how fast
    Despite the popularity of running and the availability of train-                   a runner would run a race given a previous race they have run at a
ing resources, it remains a difficult pastime. Completing a race                       different distance.
is the end result of weeks or months of meticulous training and                           One of the earliest works in this regard was undertaken by
planning. Selecting the best balance of hard work, recovery, and                       Peter Riegel [9] in 1981. Riegel examined world record times for
different training types remains a concern that requires an in-depth                   various activities, such as running and swimming, and found there
knowledge of running and human performance. While runners will                         to be a linear relationship between the log of the time taken and
exhibit natural performance increases simply by starting to run                        the log of the distance of the event. He thus fit an exponential
and by improving their fitness, it is extremely difficult to further                   equation of the form t = ax b to the world record times of different
optimise performance without access to a coach or extensive time                       running events, where t is the time taken, x is the distance and
investment in training methodologies. The availability of coaches                      a and b are constants. In this equation b can be considered the
is limited and usually reserved for competitive athletes, rather than                  slowdown coefficient, or fatigue factor, and was found to take a
                                                                                       value of 1.06 for running through the regression analysis. Riegel’s
1 http://www.runningusa.org/statistics
                                                                                       formula proves to be very effective for predicting times for races
International Workshop on Health Recommender Systems, August 2017, Como, Italy.
                                                                                       of distances shorter than the marathon and is also most effective
©2017. Copyright for the individual papers remains with the authors. Copying permit-   for the category of elite marathon runners. However, it struggles
ted for private and academic purposes. This volume is published and copyrighted by     to predict the times for casual runners in the marathon, especially
its editors.
HealthRecSys’17, August 2017, Como, Italy                                                 Jakim Berndsen, Aonghus Lawlor, and Barry Smyth


for times slower than 230 minutes (as can be seen in Figure 2). 230         3 RECOMMENDATION
minutes is significant as it is in events that take longer than this that
                                                                            3.1 Dataset
the linear relationship between log time and log distance no longer
holds due to the body switching to different energy systems. The
                                                                                              Table 1: Dataset Description
Riegel formula underestimates times drastically for the category
of slower runners, which can have a disastrous effect on their race
                                                                             Number of Runners               618318
performance if these predictions are used to inform race tactics.            Num Races                       8212756           13.28 (per runner)
    Despite its limitations Riegel’s formula is the most commonly            #Races                          8212756           13.28 (per runner)
used method for predicting race times today. It is the equation              Marathon Runners                522943(M: 62%)    324165(F: 38%)
used by many well known websites, most notably RunnersWorld                  #Marathons                      852157            2.97 (per runner)
(www.runnersworld.com).                                                      Marathon Times                  260±63
    A further regression analysis was performed by David Cameron             Mean Time Between Marathons     352 (days)
[3] in 1997. This study fit a regression through the 7 fastest times         #Different Races per Runner     4.19
in each event at the time. This regression offers a slightly better fit      #Mean Time Between Races        83 (days)
but suffers from the same problems as the Riegel formula. It also
suffers from an under prediction problem at times greater than 230             As mentioned in the related work section the current methods
minutes. Due to the relative simplicity of the Riegel compared to the       used for marathon prediction rely on data ranging from a single
Cameron Time Equivalence Model and their highly comparative                 race at each distance (i.e. the Riegel Model) to a few thousand self
results the Riegel formula is often preferred to the Cameron model          reported race times. In contrast, we have built a dataset scraped
by various running resources.                                               from various sources including race results tables and websites
    A study [14] in conjunction with slate.com aimed to address             allowing athletes to self declare race times. This dataset contains
some of the inadequacies of these systems. A survey asked runners           over 600000 runners and their entire race histories. This is the first
to report their own race times with the aim of building a better            dataset of this scale that has been collected and allows for the first
marathon predictor. The survey responses led to a dataset of 2164           large scale data analysis and machine learning approach towards
usable responses. While again using linear regression analysis, the         making predictions for a marathon time. While we do not have
model was able to utilise further information about a runner’s race         complete training histories for runners, which would give greater
history. The feature set was comprised of the two previous races            resolution in a machine learning problem, we attempt to approx-
run and the reported weekly training load of a runner. The results          imate their training schedules by looking at the frequency of the
show that such a model outperforms the Riegel model, especially for         races of various non-marathon distances the athlete has run and
more casual runners. However, the model still has limitations. The          the improvements they make as they run them. Additional features,
dataset size is relatively small and thus it is difficult to utilise more   such as age and gender, allow us to further distinguish between
advanced machine learning techniques that require large amounts             runners in a way that has been neglected by many of the previous
of data. The prediction improvement seen by using a second race             prediction models mentioned in Section 2. These features, partic-
suggests that additional information about a runner’s history is            ularly gender, have significant effects when it comes to distance
beneficial in making predictions, yet is in itself limited as it does not   running [1, 5].
take account that a runner may have run many hundreds of races
previously. Despite these limitations, the decreases in prediction          3.2    Runner Improvement
error have seen this model adopted by many runner’s resource
websites, including RunnersWorld. The model’s results show the
promise of what data analysis and machine learning techniques
can achieve in this domain and that further strides are possible.
    Moving away from prediction into the field of recommendation,
Smyth [12][13] proposes a case-based recommendation system for
recommending a personal best marathon time. Using information
of a runner’s previous marathon history and a case base of other
runner’s times Smyth recommends not only a time that runner
should be capable of, but also a race plan to achieve the time taking
into account the difficulty and terrain of the course on which the
runner will compete. The recommendation made is important, as
it should be difficult enough to feel tested but not be so fast that
a runner ends up hitting the wall. By testing against the actual            Figure 1: Average percentage change in finish time of run-
personal best run by a runner the study found that the system was           ners from one race to the next as exhibited by runners in
able to predict the personal best time of a runner to within 5%             the dataset
accuracy and generate race plans that are more than 90% similar
to the actual personal best split times run. These results show                Figure 1 depicts how a runner improves from one marathon
that there is scope for making recommendations in the domain of             race to the next. Typically a runner will see a large and steady
distance running.                                                           improvement for the first 3-4 races they run, before witnessing a
Running with Recommendation                                                                     HealthRecSys’17, August 2017, Como, Italy


                                                                         have a data sparsity issue. However, for the purposes of building
                                                                         a running recommender this is not a problem. The elite runners,
                                                                         with whom our model struggles, tend to be professional athletes
                                                                         or passionate runners. These runners tend to be well coached and
                                                                         well informed on their training and thus a recommender system is
                                                                         expected to be of limited value to them. Runners at the very slowest
                                                                         end of the spectrum tend to be one-off runners that are running
                                                                         purely for fun and they are also unlikely to see any value in using
                                                                         a recommender system.

                                                                         3.4    Benefits of Recommender System



Figure 2: Average errors of Riegel, KNN and XGB models
over time

plateau in performance. This natural performance improvement
is a first indication of the potential benefits of a recommender
system. Most runners do not have access to advanced training
methods and lack the motivation or finances to employ a running
coach, yet they still exhibit significant improvement from one race
                                                                         Figure 3: Proportion of runners running Personal Best per-
to the next. Finding suitable training plans requires a significant
                                                                         formances for elite and casual runners based on time since
amount of research as there are many different methodologies
                                                                         the first marathon run
runners have used in order to get results. A recommender system
designed to assist runners could act in the role of a professional
running coach and help personalise training plans based on data
mining the successful training approaches of similar runners. The
recommender system removes the time burden required to map out
an adequate training plan and helps the runner to improve at the
fastest rate.

3.3    Suitability of Collaborative Filtering
After creating a basic user profile for each runner comprising aver-
age race finish times at various non-marathon distances, we trained
                                                                         Figure 4: Proportion of runners running Personal Best per-
two different models to predict marathon times. These models were
                                                                         formances for elite and casual runners based on age at time
a simple K-Nearest Neighbours (KNN) model [6] and an Extreme
                                                                         of personal best
Gradient Boosting (XGB) model [4]. The total percentage errors of
these models are 10.45% (KNN) and 9.04% (XGB), which compare
favourably to the error exhibited by the Riegel model of 12.8%.             We make the reasonable claim that elite runners either have
   Figure 2 shows how these prediction errors change for athletes        access to coaching or are well-informed on training methods. As
with different race finish times. The Riegel model is indeed more ac-    a result, they produce faster times but there are other phenomena
curate than KNN or XGB for elite runners (finish times < 190mins),       that are a side effect of this optimised training. We define an elite
but the vast majority of runners have slower finish times than this.     runner as one who has run the qualifying standard for the Boston
For these slower runners, the KNN and XGB models outperform              Marathon of 190 minutes, and then we compare the performance
the Riegel model. This suggests that the use of user profiles and        progression over time of elite runners and more casual runners. The
computing similarity between them is a good approach for describ-        Boston Marathon qualifying time was chosen as this is considered
ing how a runner is likely perform. This provides some justification     a goal time for many keen marathon runners and is a time that
that a collaborative filtering system would provide an adequate ba-      requires substantial training and effort to achieve. We also tell from
sis for building a recommender system for runners. The success of        Figure 2 that it corresponds roughly to the finish time at which the
Smyth’s work [12] in recommending personal best times outlined           user based models begin outperforming the Riegel model which
in Section 2 also appears to corroborate this finding.                   makes it a natural cut off.
   It could also be pointed out here that the simpler Riegel model          In Figure 3 we show the point at which elite and casual runners
out-performs both the KNN and XGB models at various points in            first run their overall Personal Best (PB) time since running their
the distribution. This is certainly the case for the quickest runners,   first marathon. The elite runners clearly peak much earlier than
which is not surprising as the Riegel model is based on world record     casual runners, with nearly 20% of elites achieving a PB in the
times and we have little data at these points from which to build a
similarity model. Similarly, for the very slowest runners, we also
HealthRecSys’17, August 2017, Como, Italy                                                  Jakim Berndsen, Aonghus Lawlor, and Barry Smyth


first year after they start marathon running. In contrast, a higher       can be imparted to casual runners through the use of a recom-
fraction of casual runners achieve PB’s than elite runners, after a       mender system, leading to greater levels of improvement for such
period of 5 years from their first marathon. In Figure 4 we show the      runners. We have gathered an adequately large database of run-
age at which the personal best is achieved. Again, there is a strong      ners and race times and through this have shown that marathon
difference between elite and casual runners, with elites much more        times can be predicted by simple neighbourhood models. The better
likely to achieve the PB before the age of 40. Elite runners not only     prediction accuracy of these models highlights the feasibility of a
achieve their PB’s at a younger age but also at an earlier stage of       collaborative filtering approach to such recommendation. Lastly,
their running career. This finding affirms the notion that the extra      we highlighted some of the issues with such a recommender system,
knowledge elite runners have over casual runners is a significant         namely the time scale involved. We suggested methods as to how
advantage when it comes to making performance gains. It is clear          such recommendation could be presented to runners in order to
that a recommender system would be useful in this field - an au-          keep them sufficiently motivated and allow adequate time for this
tomated personalised training plan generated by a collaborative           recommendation to take effect.
recommender system would mitigate the need for a professional                In future we will implement such a recommender system. We ex-
coach or extensive knowledge of running training for a casual run-        pect that the results found will be generalisable to other endurance
ner. Such recommendations should lead to faster performance gains         sports and as such we look to expand our research into activities
from casual runners and would see casual runners maximise their           such as swimming and cycling. The proliferation of wearable tech-
potential earlier.                                                        nology, such as heart rate monitors and GPS units, provides large
                                                                          quantities of data from training events and races. Such data will
3.5    Methods of Recommendation                                          provide greater resolution for a machine learning approach and we
It is important to note at this point that not even all elite runners     will use such an approach to build a recommender system that can
maximise their potential quickly. Many elite runners will not run         inform a user before, during, and after a training session or race.
their personal best until up to five years after their first marathon.
This demonstrates the potential time scale involved in such a recom-      ACKNOWLEDGMENTS
mender system with improvements not being apparent for months             This work is supported by Science Foundation Ireland through the
or even years after the first interaction with the system.                Insight Centre for Data Analytics under grant number SFI/12/RC/2289
    Such a recommendation system poses a unique challenge. How
does a recommender system motivate a runner to keep using a               REFERENCES
system for a period of years, especially when the benefits of use          [1] R Beneke, R Leithauser, and M Doppelmayr. 2005. Women will do it in the long
may not be instant? Training for a marathon is difficult and a                 run. British journal of sports medicine 39, 7 (2005), 410.
                                                                           [2] Mustafa Bilgic and Raymond J Mooney. 2005. Explaining recommendations:
recommender may recommend a training session that, while clearly               Satisfaction vs. promotion. In Beyond Personalization Workshop, IUI, Vol. 5. 153.
beneficial, may not lead to enjoyment or satisfaction for the user.        [3] "David F. Cameron". 1998. (1998). "http://www.cs.uml.edu/~phoffman/cammod.
The recommendation system must therefore keep a user engaged                   html"
                                                                           [4] Tianqi Chen and Tong He. 2015. Xgboost: extreme gradient boosting. R package
for long periods and convince them to make potentially unwanted                version 0.4-2 (2015).
decisions in order for them to see benefit.                                [5] Robert O Deaner. 2006. More males run fast: A stable sex difference in compet-
                                                                               itiveness in US distance runners. Evolution and Human Behavior 27, 1 (2006),
    An important factor in achieving this goal is to provide the user          63–84.
with meaningful explanations. The ability of a system to make its          [6] Daniel T Larose. 2005. k-Nearest Neighbor Algorithm. Discovering Knowledge in
reasoning transparent contributes significantly to the users accep-            Data: An Introduction to Data Mining (2005), 90–106.
                                                                           [7] Ting-Peng Liang, Hung-Jen Lai, and Yi-Cheng Ku. 2006. Personalized content rec-
tance of the recommendation [2] and improves their confidence                  ommendation and user satisfaction: Theoretical synthesis and empirical findings.
in the recommendation [11]. Various training methodologies are                 Journal of Management Information Systems 23, 3 (2006), 45–70.
already well documented and explained. The concept of nudging              [8] Eoghan O’Shea, Sarah Jane Delany, Rob Lane, and Brian Mac Namee. 2014.
                                                                               NudgeAlong: A Case Based Approach to Changing User Behaviour. In Interna-
[8], to slowly adjust the user’s behaviour, has been shown to be very          tional Conference on Case-Based Reasoning. Springer, 345–359.
useful in recommender systems. The use of personalised explana-            [9] Peter S Riegel. 1981. Athletic Records and Human Endurance: A time-vs.-distance
                                                                               equation describing world-record performances may be used to compare the
tions can motivate a user to interact with the system and spur them            relative endurance capabilities of various groups of people. American Scientist
on to do the sessions the system recommends. As demonstrated                   69, 3 (1981), 285–290.
in Section 3.3 the system is capable of making predictions of the         [10] Hanna Schaefer, Santiago Hors-Fraile, Raghav Pavan Karumur, André Calero
                                                                               Valdez, Alan Said, Helma Torkamaan, Tom Ulmer, and Christoph Trattner. 2017.
runner’s finish time. As the recommender system nudges the user                Towards Health (Aware) Recommender Systems. Proc. of DH 17 (2017).
to a particular training strategy, accurate predictions of finish times   [11] Rashmi Sinha and Kirsten Swearingen. 2002. The role of transparency in rec-
and outcomes can be presented to the user to improve their moti-               ommender systems. In CHI’02 extended abstracts on Human factors in computing
                                                                               systems. ACM, 830–831.
vation and engagement with the system. For runners to gain the            [12] Barry Smyth. 2017. A Novel Recommender System for helping Marathons to
maximum benefit from such a system it must be persuasive, easy to              Achieve a new Personal-Best. Proceedings of ACM RecSys 2017, Lake Como, Italy,
                                                                               August 2017 (2017).
follow, and provide motivation so the recommender is engaging for         [13] Barry Smyth and Padraig Cunningham. 2017. ’Running with Cases: A CBR
long enough to have an effect on a runner’s training and change                Approach to Running Your Best Marathon’. Proceedings of ICCBR 2017, Trondheim,
their behaviour [7].                                                           Norway, June 2017 (2017).
                                                                          [14] Andrew J Vickers and Emily A Vertosick. 2016. An empirical study of race times in
                                                                               recreational endurance runners. BMC Sports Science, Medicine and Rehabilitation
4     CONCLUSION                                                               8, 1 (2016), 26.
In this paper we have examined the opportunity for a recommender
system for runners. Elite runners improve at a faster rate through
knowledge gained from coaching and research. This knowledge