Personalizing Exercise Recommendations with Explanations using Multi-Armed Contextual Bandit and Reinforcement Learning⋆ Parvati Naliyatthaliyazchayil1,† , Deepishka Pemmasani2 , Navin Kaushal3 , Donya Nemati4 and Saptarshi Purkayastha5 1 Dept. of Biomedical Engineering and Informatics, Indiana University Indianapolis, Indianapolis, Indiana, USA 2 Dept. of Biomedical Engineering and Informatics, Indiana University Indianapolis, Indianapolis, Indiana, USA 3 Dept. of Health Sciences, Indiana University Indianapolis, Indianapolis, Indiana, USA 4 College of Nursing, The Ohio State University, Ohio, USA 5 Dept. of Biomedical Engineering and Informatics, Indiana University Indianapolis, Indianapolis, Indiana, USA Abstract We present an innovative mobile exercise recommendation app that leverages clinical guidelines from authoritative sources to provide personalized, safe exercise suggestions. Our approach addresses two critical challenges in health-focused recommender systems: the cold start problem and user motivation through explainable AI. To overcome the initial lack of user data, we employ a two-stage process: We use Deep Q-Network (DQN) reinforcement learning to generate 2000 synthetic user profile. The DQN learns a reward function based on clinical guidelines, ensuring that the generated profiles align with established medical advice. These synthetic profiles bootstrap a multi-armed contextual bandit algorithm. This algorithm recommends the most suitable exercises for a given user persona, determined by a combination of comorbidities, age, and preferred exercise criteria. Our method’s key innovation lies in its ability to mimic a large cohort of clinically safe user profiles without requiring real-world participants, effectively eliminating the cold start problem while maintaining medical appropriateness. To enhance user engagement and promote behavior change, we implement an explainability layer. Unlike black-box deep learning recommenders, our system provides transparent justifications for each recommendation. By highlighting the importance of specific features used in the decision-making process, we help users understand why a particular exercise is recommended for their persona. This recommender system is being incorporated into an existing mobile app, which will be trialed with healthy and cardiovascular disease patients. Keywords Mobile Health, Exercise Recommender System, Reinforcement Learning(RL), Explainable AI Deep Q-Network 1. Introduction Physical exercise, widely recognized as a "miracle cure," remains underutilized despite its critical role in health maintenance and chronic disease management [1]. The World Health Organization reports that approximately one-third of the global adult population—1.8 billion individuals—are physically inactive [2]. A primary reason for this is that though people easily form habits around everyday activities, exercise is often something that is contemplated rather than consistently practiced. However, by turning exercise into a regular habit, individuals can significantly change their exercise behavior for the better and improve their health outcomes. To address this, we have developed a novel mobile recommender that provides personalized exercise recommendations based on clinical guidelines, helping users build and sustain exercise habits. With mobile devices becoming integral to daily lives, mobile Recommender Systems (RS) have gained traction in healthcare interventions, though their application to physical activity promotion remains limited [3]. HealthRecSys’24: The 6th Workshop on Health Recommender Systems co-located with ACM RecSys 2024 * Corresponding author: parumenon.pm@gmail.com $ parumenon.pm@gmail.com (P. Naliyatthaliyazchayil); dpemmasa@iu.edu (D. Pemmasani); nkaushal@iu.edu (N. Kaushal); nemati.9@osu.edu (D. Nemati); saptpurk@iu.edu (S. Purkayastha)  0009-0003-5917-4558 (P. Naliyatthaliyazchayil) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings This is due to challenges such as the "user cold start" problem, limited data for analytics [4, 5], and the opaque nature of Deep Reinforcement Learning (DRL) algorithms [6]. Our research addresses these challenges by leveraging trusted clinical guidelines and incorporating an explainability layer, making the system reliable, safe, and personalized to the user’s medical history and needs. The user cold start problem, which arises when there is insufficient behavior information about new users, has been addressed in various ways in the literature. Approaches include clustering existing users to predict new user behavior [4], employing multi-phase algorithms for user classification, getting neighbours and outcome prediction [7], and few systemic reviews summarizing methods used including using auxiliary data to augment user profiles as data approach and using various Machine Learning algorithms along with content based algorithms as method driven approach.[8]. Our approach uniquely integrates clinical guidelines practices into synthetic user profiles that can be used to train the recommender, overcoming this issue. For this, we employ a two-stage process: • Collecting the exercise recommendation guidelines for various medical conditions from their respective authoritative sources and structuring it into Machine Learning consumable format for further utilization. • Utilizing DQN to generate 2,000 synthetic user profiles using structured clinical guidelines from step-1 ensuring the generated profiles align with established medical advice. These synthetic profiles are used to train Multi-Armed Contextual Bandit (MAB) algorithm which recommends exercises making it extremely safe and useful when we don’t have past behavior history. While these generated recommendations are clinically safe, they are also tailored to fit along with the other user choices like having a workout buddy or exercising at home etc., This effectively replaces the need for real-world data for training, thus addressing the cold start problem and lack of publicly available datasets, while ensuring medical accuracy. To enhance user engagement and promote behavior change, we incorporate an explainability layer into our system. This approach aligns with Explainable Artificial Intelligence (XAI) principles, which aims to provide transparency in algorithmic decision-making processes [9]. Research indicates that an improved understanding of treatment correlates with better adherence [10, 11] and can enhance engagement and foster behavior change [12]. Our recommender system is being integrated into an existing mobile application and will be tested with both healthy individuals and those with cardiovascular conditions. This research contributes to the growing field of personalized digital health interventions by addressing key challenges in exercise recommendation systems. 2. Methodology The design of our novel exercise recommendation system comprises three main components: guideline structuring, Deep Q-Network (DQN) for synthetic data generation, and a Multi-Armed Contextual Bandit (MAB) algorithm for personalized recommendations. We also incorporate an explainable AI (XAI) layer to enhance transparency and user engagement. 2.1. Guideline Structuring and Base File Creation The goal of this step is to structure exercise guidelines from authoritative sources (e.g., American Heart Association, National Kidney Foundation) into a standardized and consumable format by DQN. The first step to structuring unstructured guidelines is to identify key attributes available commonly across most guidelines. Key attributes that were identified were age, gender, medical history, exercise preference, frequency, and duration. Next, the value of each attribute was collected from each guideline used. For e.g., value of attribute ’exercise preference’ is cardio if the guideline refers to preferred exercise being walking or running. Each unique combination of attribute values was assigned a reward score (0-1) based on adherence to guidelines. For e.g., the American Heart Association(AHA) recommends at least 150 minutes of moderate-intensity aerobic activity per week, with additional benefits for patients with cardiac diseases who engage in at least 300 minutes (5 hours) per week [13]. Consequently while structuring this guideline a record with 30-45 minutes of aerobic activity per day for 5 days a week will receive a higher "reward" compared to a record with 15-30 minutes of activity per day for 3 days a week because 30-45 minutes of activity for 5 days a week closely aligns with AHA guideline in this example. Other examples of exercise guidelines include the National Kidney Foundation’s recommendation for continuous activity involving large muscle groups, aiming for 30-minute sessions [14], and the American Diabetes Association’s guideline of 150 minutes of moderate-intensity exercise per week [15]. This structured dataset, termed the "base file," formed the foundation for subsequent steps. Figure 1 shows couple rows from base file showing how the structure looks: Figure 1: Example rows from base file 2.2. DQN for synthetic data generation We employed a Deep Q-Network (DQN), a model-free, off-policy reinforcement learning algorithm [16, 17], to generate synthetic user profiles. A user profile is referring to each user’s demographics, pre-existing medical conditions along with exercise goals and exercise preferences. The DQN system comprised three main components: environment construction, state representation, and recommenda- tion policy learning. 1. Environment Construction: This phase involves creating an environment based on user behavior history[17]. In cold start scenarios, as in our study, the base file, developed in the previous step according to specific guidelines, is used to construct this environment. 2. State Representation: The environment generates a state representation that typically includes user demographics and past behaviors[17]. Our study utilizes demographic data randomly generated from defined value sets along with the base file data supplementing the absence of historical behavior for user cold start cases. 3. Recommendation Policy Learning: Guided by rewards derived from the base file’s "reward" column. This reward value is carefully designed to reflect how closely the exercise recommendations adhere to clinical guidelines based on user attributes such as age, gender, medical history, exercise preference, frequency, and duration. Using this structured reward system, the model ensures that the generated recommendations are safe and tailored to individual needs. DRL has the unique ability to leverage deep learning to approximate the value function in RL and solve high-dimensional Markov Decision Processes (MDPs)[17]. The DQN agent selected actions (attribute-value pairs) according to the policy at a given state, with rewards determined by matching rows in the base data. The agent updated its Q-values based on received rewards, learning optimal actions to generate guideline-adherent profiles. Key hyperparameters included a learning rate of 0.001, a discount factor of 0.95, an initial epsilon of 1.0, an epsilon decay of 0.995, and a minimum epsilon of 0.01. 2.3. Multi-Armed Bandit (MAB) Algorithm Implementation To develop a personalized exercise recommendation system, we implemented a Multi-Armed Contextual Bandit (MAB) algorithm using the LinUCB (Linear Upper Confidence Bound) approach [18]. The system was designed to learn from and adapt to individual user profiles and behaviors over time, based on a dataset of 2000 users’ exercise profiles and characteristics. In our study, the MAB model defined three arms corresponding to the main exercise types: cardio, strength, and flexibility. The context for each user was represented as a feature vector comprising demographic information (age, sex, race), medical history, exercise preferences (frequency, duration, location), and other relevant attributes. To improve the algorithm’s performance, we applied feature engineering techniques, including normalization of numerical features, one-hot encoding of categorical variables, and creation of interaction terms. The LinUCB algorithm was implemented to balance exploration and exploitation in recommendation selection [19]. Exploration refers to the algorithm’s attempt to try different exercise routines for a user, even if it is uncertain about their effectiveness, to gather more data. Exploitation, on the other hand, involves recommending exercises that have already shown positive results[20].For each arm a, we maintained a matrix A_a and vector b_a to estimate the coefficients 𝜃_a. For instance: For the cardio arm, A_cardio a matrix that tracks features like age age, sex, race, frequency, duration of the cardio activity, while b_cardio is a vector representing the corresponding observed rewards like exercise completion or adherence. At each interaction, the algorithm computed a score for each arm based on the current context and coefficient estimates, selecting the arm with the highest score. The model parameters were updated after each interaction using the observed reward, which was defined as a weighted combination of short-term engagement (exercise completion) and long-term health outcomes (progress towards weekly goals). 2.3.1. Incorporating Contextual Information To further refine the recommendations, we incorporated a contextual bandit model, a variant of the MAB framework that allows the algorithm to consider additional contextual information before making decisions[20]. In our case, context included variables such as age, exercise preference, duration, frequency and strength preference. By including these contextual factors, the algorithm could tailor its recommendations more closely to the user’s current state and environment, thereby increasing the likelihood of user engagement and adherence to the exercise plan. The performance of the MAB-based recommendation system was evaluated by simulating user interactions with the synthetic data. 2.4. Explainable AI Layer (XAI) To enhance transparency and foster trust, we incorporated an explainable AI layer. This layer provides insights into the rationale behind specific exercise recommendations, considering user medical history, user choices, and relevant health guidelines. The XAI component aims to support clinical adoption, ensure greater accuracy, minimize risks associated with errors or biases, and enhance user engagement [21] fostering behaviour change. Figure 2: Methodology 3. Results and Discussion As outlined in the methodology, we conducted an attribute analysis to identify the key variables and their respective valuesets necessary for structuring the exercise guidelines to create base file. Upon the creation of the base file, it was utilized in the DQN model to generate profiles. Key parameters such as average reward per episode and epsilon decay were monitored to evaluate the Q-agent’s learning progress and improvement over time. The results indicated a consistent average reward of 0.5 per episode, suggesting that the model effectively adhered to the guidelines and optimized its performance as training progressed as shown in 3. Figure 3: Average Reward per episode Additionally, the epsilon value steadily decreased throughout the agent’s lifecycle, demonstrating that the agent learned to act more optimally with experience, as depicted in the graph below Figure 4. Figure 4: Epsilon Decay Over Episodes To assess the similarity between real and synthetic data, we employed the Kolmogorov-Smirnov (KS) test, comparing the distributions of various features. Features such as Strength Training, Strength Preference, Exercise Location, Cardio Preference, Gender, Exercise Duration, Preferred Exercise, and Medical Exercise yielded high p-values (close to 1), indicating that their distributions in the synthetic data closely matched those in the real data. This result suggests that the DQN model successfully captured the essential patterns in the data while also adapting and improving its policy through exploration. Additionally, we conducted a Feature Importance Similarity analysis to compare the importance of features when models were trained on real versus synthetic data. The analysis produced a Feature Importance Similarity score of 0.9787, indicating a high degree of similarity. This suggests that the synthetic data effectively captured the critical features. Figure 5 shows a Feature Importance Comparison, with Age used as the target variable. 3.1. MAB Algorithm Training and Evaluation We evaluated the performance of our Multi-Armed Contextual Bandit (MAB) algorithm using the LinUCB approach over a simulated period of 30 days, with 2000 synthetic user profiles generated by Figure 5: Feature Importance Comparison the Deep Q-Network (DQN). The evaluation focused on the algorithm’s ability to provide personalized exercise recommendations and adapt to user profiles over time. Figure 6: MAB Learning Curve - Daily Average Reward 3.1.1. Convergence and Learning Rate The MAB algorithm demonstrated rapid convergence, with the average reward stabilizing after approx- imately 15 days of simulated interactions. Figure 6 illustrates the learning curve, showing the daily average reward across all users. The learning rate, 𝛼, was set to 0.1, which provided a balance between quick adaptation and stability. We observed that higher learning rates (e.g., 0.2, 0.3) led to faster initial convergence but increased volatility, while lower rates (e.g., 0.05, 0.01) resulted in slower learning but more stable long-term performance. 3.1.2. Cumulative Regret Cumulative regret, a key metric for evaluating MAB algorithms, measures the difference between the optimal and actual rewards received over time. Our LinUCB implementation achieved a sub-linear cumulative regret, as shown in Figure 7. The final cumulative regret after 30 days was 487.3, which is 18.9% lower than a standard 𝜖-greedy approach (600.5) and 32.4% lower than a random selection baseline (721.6). Figure 7: Cumulative Regret over 30 Days 3.1.3. Recommendation Accuracy and Diversity To assess recommendation diversity, we calculated the Intra-List Distance (ILD) metric, which measures the dissimilarity between recommended items. The average ILD increased from 0.58 on day 1 to 0.73 on day 30, suggesting that the algorithm provided a more diverse range of recommendations as it learned user profiles. 3.1.4. Exploration vs. Exploitation Balance We monitored the exploration-exploitation trade-off using the percentage of exploratory actions taken by the algorithm. Figure 8 shows how this percentage changed over time. Figure 8: Percentage of Exploratory Actions over Time The exploration rate decreased from an initial 40% to approximately 15% by day 30, indicating that the algorithm transitioned from a more exploratory phase to a more exploitative one as it gained confidence in its learned preferences. 3.1.5. Computational Efficiency The average time to generate a recommendation was 12.3 milliseconds (ms) with a standard deviation of 2.1 ms, measured on a system with an AMD 5900X and 32GB RAM. This performance suggests that the algorithm is suitable for real-time recommendations in a mobile application setting. We plan to transfer this to our mobile app, which might be slightly slower in its recommendations but still be fast enough for acceptable UX. In summary, our MAB algorithm demonstrated effective learning, personalization, and adaptation capabilities in providing exercise recommendations. The results show improvements in recommendation accuracy and diversity over time, with successful contextual adaptation across different user segments. The sub-linear cumulative regret and efficient computational performance further support the viability of this approach for personalized exercise recommendation systems. 3.2. Results from the Explainable AI Layer The integration of an explainable AI (XAI) layer into our exercise recommendation system yielded significant improvements in transparency, user understanding, and overall system effectiveness. We evaluated the XAI layer’s performance using only quantitative metrics, since our mobile app trial will be in the future. 3.2.1. Transparency and Interpretability We analyzed the SHAP values for a sample of 2000 synthetic user profiles to understand the relative importance of different features in generating exercise recommendations from the MAB trial. Table 1 shows the average absolute SHAP values for the top 10 features: Feature Cardio Strength Flexibility Age 0.3215 0.2987 0.2328 Exercise Frequency 0.2843 0.2765 0.2345 Exercise Duration 0.2567 0.2612 0.2132 Medical History 0.2456 0.2534 0.1946 Preferred Exercise 0.2234 0.2176 0.1905 Weekly Goal 0.2012 0.1987 0.1629 Cardio Preference 0.2345 0.1234 0.1383 Strength Preference 0.1234 0.2345 0.1192 Sex 0.1456 0.1567 0.1273 Race 0.1234 0.1345 0.1282 Start Preference 0.1123 0.1234 0.1456 Exercise Variety 0.1345 0.1456 0.1678 Exercise Location 0.1234 0.1345 0.1567 Exercise Buddy 0.1012 0.1123 0.1345 Coach Appearance 0.0901 0.0987 0.1123 Table 1 Average SHAP Values for Features Across MAB Recommendation Arms Age, Exercise Frequency, and Exercise Duration are shown as the most important features across all three arms. Medical History and Preferred Exercise also have high SHAP values, indicating their significance in personalizing recommendations. Cardio Preference has a higher SHAP value for the Cardio arm, while Strength Preference has a higher value for the Strength arm, as would be expected. Some features, like Race and Coach Appearance, have lower SHAP values, suggesting they have less influence on the recommendations. The relative importance of features varies across the three arms, reflecting how different factors may be more or less relevant for different types of exercise recommendations. 3.2.2. Recommendation Consistency and Fairness assessment To assess the consistency of recommendations across similar user profiles, we calculated the Jaccard similarity index for recommendations made to users with similar characteristics. For users with matching Age (±5 years), Sex, and Medical History, the average Jaccard similarity of recommendations was 0.73, indicating a high degree of consistency while still allowing for personalization. To ensure the model wasn’t biased against particular demographic groups, we conducted a fair- ness assessment using the equal opportunity difference (EOD) metric [22]. The EOD values for key demographic features were: 1. Sex: 0.05 2. Race: 0.07 3. Age Groups (18-35, 36-55, 56+): 0.06 These values suggest relatively low levels of demographic bias in the recommendations, though there is still room for improvement. The addition of the SHAP-based XAI layer increased the average recommendation generation time from 12.3 ms to 89.7 ms (± 5.2 ms). This increase in latency is considered acceptable given the valuable insights provided by the explanations. 3.2.3. Example explanation Here is a 62-year-old male user with hypertension, preferring strength training 7 times a week for 90-105 minutes. Recommendations provided to user: A mix of moderate-intensity strength training and low-impact cardio exercises. XAI rationale visible to user: "Exercise recommended for you, considering your age of 62 years and medical history of hypertension, where staying active with combination of cardio and strength training for 30-45 minutes a day, 4-5 days a week, is ideal. This follows the exercise guidelines to manage hypertension, keeping you healthy and strong." The top 3 features influencing this recommendation by recommender were: 1. Age (SHAP value: +0.42): Increased the likelihood of recommending low-impact exercises 2. Medical History: Hypertension (SHAP value: -0.38): Decreased the intensity of recommended strength training 3. Exercise Frequency (SHAP value: +0.35): Increased the variety of recommended exercises 4. Future work While these quantitative results provide valuable insights into the functioning of our Recommender System and the XAI layer, future work will include qualitative studies to evaluate user understanding and satisfaction with the explanations provided. We plan to conduct: 1. Semi-structured interviews with a diverse group of users to gather in-depth feedback on the clarity and usefulness of the explanations. 2. A longitudinal study to assess how the presence of explanations affects user adherence to recom- mended exercise routines over time 3. A comparative study between different explanation formats (e.g., natural language vs. visual representations) to determine the most effective way to communicate the reasoning behind recommendations. Future iterations of this system will incorporate additional factors such as exercise intensity and explore more sophisticated feature interactions, further enhancing the personalization and effectiveness of the recommendations. Planned qualitative studies, including semi-structured interviews and longitu- dinal assessments, will provide crucial insights into user understanding, satisfaction, and long-term adherence to recommended exercise routines. As we move forward with testing the system on both healthy individuals and those with cardiovascular conditions, we aim to validate its effectiveness in real-world scenarios. 5. Conclusion This study introduces a novel recommendation system designed to address key challenges in personalized exercise interventions. By using DQN with MAB algorithm, we solve the user cold start problem [6, 4, 5] and improve the interpretability of deep learning models, ensuring exercise recommendations are personalized and aligned with medical advice. The implementation of this recommender within a mobile application can not only promote regular physical activity but can also help users build lasting exercise habits in a mobile-driven world. With an average recommendation generation time including the XAI layer being 89.7 ms (± 5.2 ms), our system is both computationally efficient and practical for real-time use. Our approach represents a significant step forward in digital health, combining advanced ML with XAI layer to promote guideline-based physical activity. By tackling the cold start problem and enhancing algorithm transparency, this system has the potential to foster lasting behavior change and improve public health. 6. Disclosure Parvati Naliyatthaliyazchayil hereby discloses that she has volunteered at Indiana University and is currently employed by ConcertAI. This disclosure applies solely to Parvati Naliyatthaliyazchayil and does not extend to any of the other authors of this paper. References [1] NHS, Benefits of exercise, 2024. URL: https://www.nhs.uk/live-well/exercise/ exercise-health-benefits/, accessed: 2024-08-12. [2] WHO, Physical activity, 2024. URL: https://www.who.int/news-room/fact-sheets/detail/ physical-activity, accessed: 2024-08-12. [3] U. Bhimavarapu, M. Sreedevi, N. Chintalapudi, G. Battineni, Physical activity recommendation system based on deep learning to prevent respiratory diseases, Computers 11 (2022) 150. doi:10. 3390/computers11100150. [4] A. Panteli, B. Boutsinas, Addressing the cold-start problem in recommender systems based on frequent patterns, Algorithms 16 (2023) 182. doi:10.3390/a16040182. [5] Appier, 7 critical challenges of recommendation engines, https://www.appier.com/en/blog/ 7-critical-challenges-of-recommendation-engines, 2024. Accessed: 2024-08-12. [6] V. Hassija, V. Chamola, A. Mahapatra, et al., Interpreting black-box models: A review on explainable artificial intelligence, Cognitive Computation 16 (2024) 45–74. URL: https://doi.org/10.1007/ s12559-023-10179-8. doi:10.1007/s12559-023-10179-8, published: 24 August 2023, Issue Date: January 2024. [7] B. Lika, K. Kolomvatsos, S. Hadjiefthymiades, Facing the cold start problem in recommender systems, Expert Systems with Applications 41 (2014) 2065–2073. URL: https://www.sciencedirect. com/science/article/abs/pii/S0957417413007240. doi:10.1016/j.eswa.2013.09.005. [8] H. Yuan, A. A. Hernandez, User cold start problem in recommendation systems: A systematic review, IEEE Access (2023). URL: https://doi.org/10.1109/ACCESS.2023.3338705. doi:10.1109/ ACCESS.2023.3338705, license: CC BY-NC-ND 4.0. [9] S. A., S. R., A systematic review of explainable artificial intelligence models and applications: Recent developments and future trends, Decision Analytics Journal 7 (2023) 100230. URL: https: //www.sciencedirect.com/science/article/pii/S277266222300070X. doi:10.1016/j.dajour.2023. 100230. [10] O. Awwad, A. Akour, S. Al-Muhaissen, D. E. Morisky, The influence of patients’ knowledge on adherence to their chronic medications: A cross-sectional study in jordan, International Journal of Clinical Pharmacy 37 (2015) 504–510. URL: https://doi.org/10.1007/s11096-015-0086-3. doi:10.1007/s11096-015-0086-3, epub 2015 Feb 24. [11] F. Folkvord, A.-R. U. Würth, K. van Houten, et al., A systematic review on experimental studies about patient adherence to treatment, Pharmacology Research Perspectives 12 (2024) e1166. doi:10.1002/prp2.1166. [12] A. H. Krist, S. T. Tong, R. A. Aycock, D. R. Longo, Engaging patients in decision-making and behavior change to promote prevention, Studies in Health Technology and Informatics 240 (2017) 284–302. doi:10.3233/ISU-1708. [13] American Heart Association, Aha recommendations for physical activity in adults, 2024. URL: https://www.heart.org/en/healthy-living/fitness/fitness-basics/ aha-recs-for-physical-activity-in-adults, accessed: August 2024. [14] National Kidney Foundation, Stay fit, 2024. URL: https://www.kidney.org/atoz/content/stayfit, accessed: August 2024. [15] American Diabetes Association, Weekly exercise targets, 2024. URL: https://diabetes.org/ health-wellness/fitness/weekly-exercise-targets, accessed: August 2024. [16] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, M. Riedmiller, et al., Human-level control through deep reinforcement learning, Nature 518 (2015) 529–533. URL: https://www.nature.com/articles/nature14236. doi:10.1038/nature14236. [17] X. Chen, L. Yao, J. McAuley, G. Zhou, X. Wang, Deep reinforcement learning in recommender systems: A survey and new perspectives, Knowledge-Based Systems 264 (2023) 110335. URL: https://www.sciencedirect.com/science/article/pii/S0950705123000850. doi:10.1016/j.knosys. 2023.110335. [18] K.-H. Huang, H.-T. Lin, Linear upper confidence bound algorithm for contextual bandit problem with piled rewards, in: Advances in Knowledge Discovery and Data Mining: 20th Pacific-Asia Conference, PAKDD 2016, Auckland, New Zealand, April 19-22, 2016, Proceedings, Part II 20, Springer, 2016, pp. 143–155. doi:10.1007/978-3-319-31750-2_12. [19] D. Bouneffouf, S. Upadhyay, Y. Khazaeni, Linear upper confident bound with missing reward: Online learning with less data, in: 2022 International Joint Conference on Neural Networks (IJCNN), IEEE, 2022, pp. 1–6. doi:10.1109/IJCNN55064.2022.9892856. [20] A. Slivkins, Introduction to multi-armed bandits, Foundations and Trends® in Machine Learning 17 (2024) 1–143. URL: https://arxiv.org/abs/1904.07272. doi:10.1561/2200000068, first draft: January 2017; Published: November 2019; Latest version: April 2024. [21] Z. Sadeghi, R. Alizadehsani, M. A. CIFCI, S. Kausar, R. Rehman, P. Mahanta, P. K. Bora, A. Al- masri, R. S. Alkhawaldeh, S. Hussain, B. Alatas, A. Shoeibi, H. Moosaei, M. Hladík, S. Nahavandi, P. M. Pardalos, A review of explainable artificial intelligence in healthcare, Computers and Electrical Engineering 118 (2024) 109370. URL: https://www.sciencedirect.com/science/article/pii/ S0045790624002982. doi:10.1016/j.compeleceng.2024.109370. [22] M. Hardt, E. Price, N. Srebro, Equality of opportunity in supervised learning, Advances in neural information processing systems 29 (2016). doi:10.48550/arXiv.1610.02413. A. Online Resources The code for this project is available on GitHub at the following repository: https://github.com/iupui- soic/exercise-behavior-change-app/tree/recommendersys