’Fitness that Fits’: A prototype model for Workout Video Recommendation Ercan Ezin Eunchong Kim Iván Palomares Computer Science Department Computer Science Department Computer Science Department University of Bristol University of Bristol University of Bristol Bristol, United Kingdom Bristol, United Kingdom Bristol, United Kingdom ercan.ezin@bristol.ac.uk ek17843@my.bristol.ac.uk i.palomares@bristol.ac.uk ABSTRACT advantage of the growing demand for online resources to promote Personalization services enable Internet users to benefit from tai- exercising, online workout videos have proliferated in recent years lored recommended content at their finger tips. Our interest in as an alternative means to keep users active from the comfort of this contribution lies in video recommendation within the fitness home or beyond, with a number of advantageous characteristics: domain to support an active lifestyle. We present ’Fitness that Fits’, • They are convenient, providing 24/7 access to a wealth of fit- a preliminary platform for workout video recommendation, which ness resources from anywhere with an Internet connection. benefits from the Youtube-8M labeled dataset and its rich variety • They do not require commitment to work out at an externally of categorized video labels, thereby enabling fitness workout video imposed day or time. recommendations predicated on the users’ preferences and their • With a careful search and use of the resources available, they recent viewing behavior. The proposed model also incorporates an provide a wealth of workouts from a diversity of instructors. approach to produce diversified recommendations and foster the • They are cost-effective and can be undertaken in a more practice of distinct fitness activities based on like-minded users’ individual and private space. information. An initial experimental study shows the trade-offs of the hybrid approach considered. Whether it is bodyweight workouts, aerobic exercises, perfor- mance hacks or skill gaining tutorials, Youtube provides millions of KEYWORDS users with access to a wealth of video resources to support them in Personalized Wellbeing; Preference Modeling; Diversity practicing their preferred workouts anywhere and anytime. Despite ACM Reference Format: the potential benefits to Youtube users of receiving personalized Ercan Ezin, Eunchong Kim, and Iván Palomares. 2018. ’Fitness that Fits’: recommendations from the platform as a whole [5, 6] and subscribe A prototype model for Workout Video Recommendation. In Proceedings of to specialized Youtube channels, the popular Internet platform does the Third International Workshop on Health Recommender Systems co-located not provide an exhaustive categorization of workout videos into dif- with Twelfth ACM Conference on Recommender Systems (HealthRecSys’18), ferent types of exercise and sport. Whilst this is not a critical issue Vancouver, BC, Canada, October 6, 2018 , 6 pages. for most users interested in a sheer variety of videos and themes, nowadays there is a growing niche of users specifically interested in 1 INTRODUCTION accessing fitness videos to a considerable extent. These users would The Internet and its associated technologies have become an in- benefit from a bespoke service for fitness workout video recom- dispensable tool to search products, services or frequently access mendation, that exploits categorized (labeled) video data describing information needed in our daily lives, e.g. booking a hotel, pur- the types of activities shown in such workout videos. Some studies chasing a new device or consulting the weather forecast. We are focus on recommending video resources related to healthcare [11] presently reported to spend an average 6 hours per day connected and, more specifically, fitness [3]. Notwithstanding, state-of-the-art to the Internet. Amid this phenomenon, there is an increasing inter- works on fitness video recommendations mostly rely on small video est in seeking aid in the Internet to embrace healthier lifestyles, e.g. data sets that have been carefully selected by a domain expert. This through the search and sharing of information related to fitness causes any model implementation to be hardly extrapolatable into exercises and wellness practices, or via smartphone apps [10]. For a large-scale setting, making them poorly generalizable [11]. instance, the rate of Google searches based on keywords such as The Youtube-8M dataset is a clear example of large-scale dataset “personal trainer”, “crossfit”, “hiit”, “core” or “pilates” has dramati- containing comprehensive information about millions of videos cally increased in the last decade [2]. uploaded to Youtube. Despite having been primarily investigated for Although gyms and leisure centers are a common choice for tasks such as the classification and further understanding of video users who desire to adopt or maintain an active lifestyle, they data [4], it has barely been used for recommendation processes to are not always within the reach of every person, e.g. owing to date. As a result of classification and supervised machine learning financial limitations, busy schedules, frequent traveling, etc. Taking processes on data originating from Youtube videos, Youtube-8M incorporates labels associated to the videos, thereby describing the HealthRecSys’18, October 6, 2018, Vancouver, BC, Canada topic(s) to which they belong, including a number of fitness activity © 2018 Copyright for the individual papers remains with the authors. Copying permit- types: this amount of labeled video data has an untangled potential ted for private and academic purposes. This volume is published and copyrighted by to investigate and enhance existing recommendation approaches on its editors. large volumes of video related to specific domains such as fitness. HealthRecSys’18, October 6, 2018, Vancouver, BC, Canada E. Ezin, E. Kim and I. Palomares To overcome the challenges outlined above, this contribution and contextual information to produce socially enhanced fitness presents ’Fitness that Fits’, a prototype platform for workout video recommendations. recommendation, which relies on Youtube-8M video data describ- ing fitness activities. Our main contribution is a recommendation 3 DATA AND SYSTEM OVERVIEW model that extends principles from content-based and collabora- This section describes the currently used data in ’Fitness that Fits’ tive filtering by introducing mechanisms to provide end users with and briefly overviews its Web platform being developed. meaningful and diverse workout video recommendations. YouTube-8M is a large-scale labeled video dataset which, as of June 2018, consists of over 6 million of YouTube video instances 2 RELATED WORK (which add up to 350,000 hours of video), namely video IDs with Recommender systems applied to health-related domains are still high-quality annotations generated by machine learning techniques, relatively scarce [8], particularly in the area of fitness and general describing a highly diverse vocabulary of over 3.8K different en- wellbeing. This section reviews some representative recent works tities (labels). Each video in the dataset contains an approximate on recommender approaches in these domains. Following this, we average of three labels. These videos have been sampled uniformly briefly describe specific models targeting the fitness domain, along with the aim of preserving the highly heterogeneous distribution with similarly purposed models for video recommendations. of popular Youtube contents, whilst ensuring the dataset quality Ceron et al. [3] presented CoCare, a mobile-based recommender and stability by enforcing a series of restrictions. The dataset also system aimed at supporting health promotion and preventing dis- incorporates precomputed audio-visual features from billions of eases, by recommending physical activity videos based on users’ frames and audio segments, which facilitates an efficient training profiles and their context. Their approach relies in a decision tree of baseline models without the need for sophisticated computer learning algorithm and a Case-Based Reasoning (CBR) module with settings, and (ii) enables a deep exploration of complex audio-visual a twofold purpose: (i) classifying and tagging videos predicated models that otherwise would be impractical to train. on their textual description, and (ii) calculating similarity values The use of Youtube-8M has been illustrated in recent research between user profiles and video categories. The system presents efforts including large-scale video classification [12], multi-label the limitation of not exploiting the vast and hugely accessed videos classification [14], extraction of the hierarchical structure associated from Youtube, which as outlined by the authors, would required a to Web video groups [9], etc. To the best of our knowledge, however, proper categorizing and profiling process to make the recommen- Youtube-8M labeled video data has been barely investigated within dation process suitable to the specific domain. Instead, it relies on a the scope of recommender systems research, despite its potential small assortment of videos selected by expert users. This is indeed advantages for fitness workout video recommendation: a limitation present in other approaches to health-related video • Its comprehensive topic information (labels) resulting from recommendation [3, 11]. previous classification efforts by the Youtube-8M community, To assist health professionals, patients and caregivers in the pro- constitutes an added value for highly personalized video cess of finding relevant information amid a plethora of it, Sánchez- recommendation. Bocanegra et al. [11] built HealthRecSys: a content-based recom- • Topic information can be further exploited to promote diver- mender system aimed at providing personalized links to educational sity in recommendations. content that supplements online health videos. Extracted text from • It provides a large and rapidly increasing volume of videos metadata in relevant Youtube videos is analyzed to generate Med- from the Youtube platform. line Plus1 links. Expert information from healthcare professionals We built an initial dataset by filtering the original Youtube-8M is required to gather a relevant dataset of videos, such that multi- labeled video dataset 2 predicated on the following filtering criteria: ple experts rate the quality and relevance of recommended links (1) Highly-viewed: Only videos with a minimum of 50K views for given videos, and only those videos and links showing consen- in Youtube are selected for the scope of the preliminary sus among experts are selected for the experimental study of the research presented in this contribution. Whilst important, system feasibility. This human effort is motivated by the need for tackling the popular cold-start problem associated to newly mitigating potential risks for health consumers. added content lies outside the scope of the present study. Within the particular scope of the fitness domain, two studies on (2) Fitness-related: Videos having machine-generated annota- recommendations for running were presented by Berndsen et al. [1]. tions of ’Beauty and Fitness’ narrowed down to 16 labels, In, [1] the authors investigate the performance and progression dif- associated with popular and highly-viewed types of fitness ferences between casual and elite runners, examining the feasibility activities in accordance with criterion (1). of a recommender system for runners. The importance of providing runners with explainable recommendations and using resources to The resulting video dataset comprises 1.7K fitness workout videos keep them motivated, are particularly highlighted. A different ap- with over 1K user views each, which supposes an elevated number proach on fitness activity recommendation was adopted by Dharia of videos in contrast to other related works [3, 11]. et al. in [7] , with a focus on social recommendations for person- Figure 1 provides an overview of the user interface for the ’Fit- alized assistance in performing fitness activities: their approach ness that Fits’ Web platform. Besides the interface for exploring integrates mobile or wearable-based activity data, preferences, goals videos or viewing the recommendation list, the user can establish and update anytime a profile by selecting at least two labels as her 1 Medline Plus website: https://medlineplus.gov/ 2 https://research.google.com/youtube8m/ ’Fitness that Fits’: A prototype model for Workout Video Recommendation HealthRecSys’18, October 6, 2018, Vancouver, BC, Canada Figure 1: Appearance of ’Fitness that Fits’ platform favorite types of workout. In the following, our interest focuses on (B) Measuring diversity of recommendations. presenting the recommender model implemented in the platform. (C) Diversity-aware replacement process. 4 RECOMMENDER MODEL These three novel features are labeled using (A), (B), (C) in Figure 2. The model for the recommendation process underlying our plat- The content-based and collaborative filtering steps are implemented form is illustrated in Figure 2. It consists on a hybrid approach upon basic approaches in the current prototype version of this work, incorporating basic principles from content-based and neighbor- and they can be seamlessly replaced by other existing methods with based collaborative filtering, on top of which three features are similar aim. We therefore concentrate the subsequent discussion introduced: on the three distinctive features listed previously. (A) Identifying user preferences. HealthRecSys’18, October 6, 2018, Vancouver, BC, Canada E. Ezin, E. Kim and I. Palomares User behavior (recently viewed (A) Identifying User preferences Video features video history) user preferences : [...] User profile CONTENT-BASED FILTERING : (favorite types of fitness activities) User-Video similarity Recommendation List (1) : Are the YES recommendations (B) Measuring diverse enough? (2) : diversity of Deliver recommendations (C) Diversity-aware NO recommendations replacement (3) : process in recommendation list COLLABORATIVE FILTERING Neighbor users' (?) : recommendation lists User-User similarity Figure 2: Scheme of the proposed model for workout video recommendation 4.1 (A) Identifying user preferences each topic, e.g. “how often has ui watched HIIT4 videos over Two sources of user data are taken as an input to model their current the last month?” preferences: the user profile and the recent user behavior. Both sources of information are combined into the current prefer- j a) User Profile: A signed up user in the platform must specify ences of ui . Let Γi ≥ 0 denote the number of videos labeled with their interests in different types of fitness activities (topics), the jth topic in ui recent history. Based on a normalized arctangent, j by selecting her/his favorite ones. Let m ≫ 2 be the number a smoothed or “fuzzy” degree or preference p̂i ∈ [0, 1] is calculated: of topics available in the system3 . Let ui ∈ U be the ith user. Her profile is modeled as an m-dimensional binary vector j Γi ! j j 4 Pi = (pi1 , pi2 , . . . pim ), where: p̂i = arctan ωi · + (1 − ωi ) · pi · (1) maxk ∈M Γik π 1 if the jth topic is selected in ui profile,  j pi = with M = {1, 2, . . . , m} the label index set. As shown in Eq. (1), 0 otherwise. j p̂i relies on a weighted average of (i) the relative frequency of For instance, ui profile could include “Kettlebells”, “Aerobics” workout videos containing topic j in the recent viewing history, j and “Pilates” as activated topics or favorite workout types. and (ii) the (binary) profile information given by pi . The weighting b) User Behavior: This data is captured by analyzing the user’s parameter ωi ∈ [0, 1] describes the relative importance assigned to recent video viewing history of the user, namely the fre- the user behavior information against her profile. It does neither quency at which the history contains videos labeled with require being fixed a priori nor it is equal for all users. Instead, ωi 3 For the current experimental version of the model, we have m = 16. 4 High-Intensity Interval Training ’Fitness that Fits’: A prototype model for Workout Video Recommendation HealthRecSys’18, October 6, 2018, Vancouver, BC, Canada is dynamically assigned to each user as follows: ui with another video stemming from one of her neighbor users’ ( ) recommendation list, based on the following steps: hl(i) ωi = min , 1 (2) (1) Sample one of the K neighbor users of ui , denoted ui ′ , by 2 · n1 k hl(k) Í normalizing similarities sim(u, u ′ ) into probabilities for each with hl(i) ∈ N the history length or number of recently viewed neighbor to be picked. Retrieve from the database the matrix videos by ui . This dynamic adjustment of ωi adapts to the degree Ri ′ containing his latest list of recommendations. to which the user utilizes the system: the more engaged ui has (2) Check the rows (recommended videos) in Ri ′ in descending recently been (larger recent viewing history), the higher ωi and the order. Choose the first occurrence of video containing at more relevance is assigned to the behavior information. If ui recent least one topic j that does not appear in Ri . If no videos in engagement equals the average recent engagement of all users, i.e. Ri ′ hold this condition, return to step (1). if hl(i) = n1 k hl(k), then ωi = 0.5. Í (3) Assume that the selected video in Ri ′ corresponds to the vth row of the neighbor’s recommendation matrix. Then, in the As a result, a user’s current preference vector P̂i = (p̂i1 , p̂i2 , . . . p̂im ) target user matrix Ri , replace the existing video in the vth is yielded for each ui ∈ U . Based on the cosine similarity between position with the selected video from ui ′ recommendations. a user preference and a video representation Pv = (pv1 , . . . , pvm ), j As a result of an iteration, a single-video replacement is made on pv ∈ {0, 1}, a content-based filtering process is subsequently ap- plied, leading to a preliminary video recommendation list of size N . Ri , after which its diversity level is measured again. The overall This list might still be adjusted before being delivered to the end iterative process described in Section 4.2 and 4.3 is repeated until the user, as explained in the following two subsections. recommendations are diverse enough, or an a priori fixed maximum number of iterations is exceeded. In either case, the final adjusted 4.2 (B) Measuring diversity of recommendation list is provided to ui . recommendations 4.4 Preliminary Experiments One of the aims of the proposed system is to provide users with This subsection briefly outlines a preliminary evaluation conducted recommended videos that are both relevant (in accordance with on the current version of ’Fitness that Fits’ underlying model. We their current preferences) and diverse. Diversity in workout recom- remark that despite the considerable volume of real labeled video mendations may not only help exploring “new” types of workout data available, the present study relies exclusively on a small and the user might potentially like, but also fosters variety of workouts partly synthetic dataset of user information (profiles and viewing in such recommendations to prevent an eventual sense of boredom. histories). Deploying the platform into a Web environment and Let Ri = (rvi j )N ×M be a matrix representation of ui ’s recommen- gathering larger amounts of user data for a comprehensive evalua- dation list, where the vth row contains the M-dimensional vector tion, constitutes our most immediate direction of future work. representation of the vth recommended video, hence rvi j = 1 if the We consider a sample of 10 users, a recommendation list size of vth recommended video is labeled with topic j, and rvi j = 0 other- N = 30, a number of neighbor users K = 3 for the diversification wise. By introducing a diversity threshold δ ∈]0, 1], the diversity strategy, and a diversity threshold δ = 0.375 (at least 6 out of 16 level of Ri , denoted D(Ri ), is measured and compared against δ , topics to appear in Ri ). This preliminary experiment focuses on predicated on the number of topics appearing in at least one video measuring the diversity ratio D(Ri )/δ and the average similarity in Ri ( denotes the logical disjunction ’OR’ operation): Ô or relevance S(Ri ) in the user’s recommendation matrix, before and after applying some replacements under the proposed diversifica- Í ÔN i  j v=1 rv j tion strategy: until δ is achieved or at most N /2 replacements are D(Ri ) = (3) made on Ri . The similarity score S(Ri ) ∈ [0, 1] is calculated at the M average of cosine similarities between the user current preferences If D(Ri ) ≥ δ (i.e. the ratio D(Ri )/δ ≥ 1 as shown later on in exper- and her N recommended video representations. Figure 3 summa- iments) then the recommended videos for ui are sufficiently diverse rizes the results obtained for the sample considered. Intuitively, and they are supplied to the user. Conversely, if D(Ri ) < δ then a for those users whose preliminary recommendations were already neighborhood-based collaborative filtering approach is adopted to sufficiently diverse (in the Example, u 7 and u 10 ) no changes are further diversify Ri based on the information extracted from similar required on the preliminary recommendations. The initial and final users’ recommendation lists. Importantly, we adopt a variant of values of D(Ri ) (resp. S(Ri )) are shown above (resp. below) the classical user-user collaborative filtering in which, once the K most plot bars. similar users to the target user have been identified (predicated This initial evaluation shows that, although the model can ef- on the similarity between their current preference vectors P̂i ), we fectively increase the diversity of recommended workout videos, analyze those neighbor users’ recommendation lists, as opposed to this normally comes at the expense of a decline in the relevance existing approaches that apply a rating prediction function. (similarity) or such recommendations with respect to the target user preferences. This is not surprising, since the initial content-based 4.3 (C) Diversity-aware replacement process filtering stage (before diversifying) relies exclusively on the users’ An iterative procedure is introduced to diversify the recommen- preferences, and the current use of a cosine similarity may make the dation list Ri . The procedure is characterized by replacing - at resulting relevance sensitive to any zeros in either user preferences each iteration - one of the videos recommended to the target user of video representations. Moreover, data containing recent viewing HealthRecSys’18, October 6, 2018, Vancouver, BC, Canada E. Ezin, E. Kim and I. Palomares 0.48 ; 1.0 • Recommend longer (composite) workout recommendations by producing sequences of smaller workout videos while 0.58 ; 0.99 advocating diversity in such workouts. • Incorporate additional types of explicit and implicit prefer- ences from real users, e.g. liked-disliked videos from Youtube and favorite videos. 0.68 ; 0.91 0.75 ; 0.93 0.76 ; 0.92 0.71 ; 0.98 ACKNOWLEDGMENTS 0.89 ; 1 The authors would like to thank anonymous reviewers for their 0.79 ; 0.84 insightful and constructive comments on the initial stage of this 1.01 ; 1.01 1.07 ; 1.07 research, some of which would lay the foundations for future work. 0.7 ; 0.7 0.71 ; 0.71 0.81 ; 0.77 REFERENCES 0.80 ; 0.71 0.83 ; 0.74 0.76 ; 0.67 0.73 ; 0.62 [1] J. Berndsen, A. Lawlor, and B. Smyth. 2017. Running with Recommendation. In 0.81 ; 0.66 0.83 ; 0.67 Proc. 2nd International Workshop on Health Recommender Systems; 11th Interna- tional Conference on Recommender Systems (RecSys 2017). 18–21. [2] S. Borreani. Last access: 24th June 2018. The Fitness Sector in the Internet: Missed 0.85 ; 0.62 Opportunities. Source: https://gymfactory.net (Last access: 24th June 2018). [3] G. M. Cerón-Rios, D. M. Lopez Gutierrez, B. Díaz-Agudo, and J. A. Recio-García. 2017. Recommendation System based on CBR algorithm for the Promotion of Figure 3: Variation in the diversity ratio (D(Ri )/δ ) and rele- Healthier Habits. In Proceedings of ICCBR 2017 Workshops (CAW, CBRDL, PO- CBR), Doctoral Consortium, and Competitions co-located with the 25th International vance (average similarity to the user, S(Ri )) of Ri for δ = 0.375 Conference on Case-Based Reasoning (ICCBR 2017), Trondheim, Norway, June 26-28, 2017. 167–176. [4] E. Chen. 2017. Youtube-8M Video Understanding Challenge Approach and Appli- cations. In CVPR’17 Workshop on YouTube-8M Large-Scale Video Understanding. histories is rather scarce in the present prototype, therefore the [5] P. Covington, J. Adams, and E. Sargin. 2016. Deep Neural Networks for YouTube weighting parameter ωi tends to be low for most users and their Recommendations. In Proceedings of the 10th ACM Conference on Recommender Systems (RecSys ’16). ACM, New York, NY, USA, 191–198. static profile information (favorite topics) is prioritized. We argue [6] J. Davidson, B. Liebald, J. Liu, P. Nandy, Taylor Van V., U. Gargi, S. Gupta, Y. He, that an online evaluation of users’ experience with the system, for M. Lambert, B. Livingston, and D. Sampath. 2010. The YouTube Video Recom- instance by tracking clicks on recommended videos and analyzing mendation System. In Proceedings of the Fourth ACM Conference on Recommender Systems (RecSys ’10). 293–296. whether such clicks have been predominantly on “replacement” [7] S. Dharia, M. Eirinaki, V. Jain, J. Patel, I. Varlamis, J. Vora, and R. Yamauchi. videos or not, will be an interesting direction to extend the pro- 2018. Social recommendations for personalized fitness assistance. Personal and posed approach into a more adaptive one, where depending on Ubiquitous Computing 22, 2 (01 Apr 2018), 245–257. [8] D. Elsweiler, B. Ludwig, A. Said, H. Schaefer, and C. Trattner. 2016. Engendering the user’s response towards diversity, a tailored trade-off between Health with Recommender Systems. In Proceedings of the 10th ACM Conference diversity and relevance is sought for her/him. Another interesting on Recommender Systems (RecSys ’16). 409–410. [9] R. Harakawa, T. Ogawa, and M. Haseyama. 2017. Extracting Hierarchical Struc- finding is the dramatic increase in the diversity of R 1 and R 3 with ture of Web Video Groups Based on Sentiment-Aware Signed Network Analysis. respect to other users. This is largely due to some videos in the IEEE Access 5 (2017), 16963–16973. system having associated more topic labels than others. [10] E. T. Luhanga, A. A. E. Hippocrate, H. Suwa, Y. Arakawa, and K. Yasumoto. 2018. Identifying and Evaluating User Requirements for Smartphone Group Fitness Applications. IEEE Access 6 (2018), 3256–3269. 5 CONCLUSION AND FUTURE DIRECTIONS [11] C.L. Sánchez-Bocanegra, J.L. Sevillano-Ramos, C. Rizo, A. Civit, and L. Fernandez- Luque. 2017. HealthRecSys: A semantic content-based recommender system to This contribution presented ’Fitness that Fits’, a prototype plat- complement health videos. BMC Medical Informatics and Decision Making 17, 63 form for recommending physical workout videos upon labeled (2017), 1–10. [12] F. Scheidegger, L. Cavigelli, M. Schaffner, A. C. I. Malossi, C. Bekas, and L. Benini. video data from the Youtube-8M dataset. Besides integrating basic 2017. Impact of temporal subsampling on accuracy and performance in practical content-based and collaborative filtering mechanisms, the proposed video classification. In 2017 25th European Signal Processing Conference (EUSIPCO). recommender model incorporates novel features for the flexible 996–1000. [13] C. Trattner and D. Elsweiler. 2017. Food Recommender Systems: Important modeling of user preferences based on their profile and recent Contributions, Challenges and Future Research Directions. Collaborative Recom- viewing behavior. Furthermore, an iterative replacement strategy mendations: Algorithms, Practical Challenges and Applications (World Scientific) inspired by neighborhood-collaborative filtering is introduced to abs/1711.02760 (2017). [14] P. Xie, R. Salakhutdinov, L. Mou, and E. P. Xing. 2017. Deep Determinantal Point promote diversified recommendation lists for users to enhance with Process for Large-Scale Multi-label Classification. In 2017 IEEE International different types of fitness activities. Conference on Computer Vision (ICCV). 473–482. Besides the platform deployment, subsequent acquisition of more real user data and the elaboration of a complete experimental study, we also postulate the following directions for future research: • Recent efforts have been put in personalization services for promoting healthy habits, e.g. via food recommendations for positive nourishment practices [13], hence we aim at incor- porating labeled videos on healthy eating habits, and investi- gating the joint use of fitness and healthy eating user/content data into video recommendations for healthier living.