Personalizing Mobile Fitness Apps using Reinforcement Learning Mo Zhou Yonatan Mintz Yoshimi Fukuoka Department of Industrial Department of Industrial Department of Physiological Engineering and Operations Engineering and Operations Nursing Research Research Institute for Health & Aging, University of California, University of California, School of Nursing Berkeley, CA, USA Berkeley, CA, USA University of California, San mzhou@berkeley.edu ymintz@berkeley.edu Francisco, CA, USA yoshimi.fukuoka@ucsf.edu Ken Goldberg Elena Flowers Philip Kaminsky Department of Industrial Department of Physiological Department of Industrial Engineering and Operations Nursing, School of Nursing Engineering and Operations Research University of California, San Research University of California, Francisco, CA, USA University of California, Berkeley, CA, USA elena.flowers@ucsf.edu Berkeley, CA, USA goldberg@berkeley.edu kaminsky@berkeley.edu Alejandro Castillejo Anil Aswani Department of Industrial Department of Industrial Engineering and Operations Engineering and Operations Research Research University of California, University of California, Berkeley, CA, USA Berkeley, CA, USA castillejo.alejandro@berkeley.edu aaswani@berkeley.edu ABSTRACT 10-weeks, compared to an increase of 700 (SD ± 830) in the Despite the vast number of mobile fitness applications (apps) intervention group (receiving personalized step goals). The and their potential advantages in promoting physical activity, difference in daily steps between the two groups was 2,220, many existing apps lack behavior-change features and are not with a statistically significant p = 0.039. able to maintain behavior change motivation. This paper de- scribes a novel fitness app called CalFit, which implements ACM Classification Keywords important behavior-change features like dynamic goal setting H.5.2. User Interfaces: User-centered design; I.2.6 Artificial and self-monitoring. CalFit uses a reinforcement learning Intelligence: Learning; K.4.1 Computers and Society: Public algorithm to generate personalized daily step goals that are Policy Issues: Computer-related healthcare issues; J.4 Social challenging but attainable. We conducted the Mobile Student and Behavioral Sciences: Psychology Activity Reinforcement (mSTAR) study with 13 college stu- dents to evaluate the efficacy of the CalFit app . The control Author Keywords group (receiving goals of 10,000 steps/day) had a decrease in Physical activity; interface design; mobile app; fitness app; daily step count of 1,520 (SD ± 740) between baseline and goal setting; personalization. INTRODUCTION Regular physical activity (e.g., walking or running) is an impor- tant factor in preventing the development of chronic diseases like type 2 diabetes, cardiovascular disease, depression, and certain types of cancer [33, 55, 56]. Because of its impor- tance in maintaining good health, the 2008 Physical Activity ©2018. Copyright for the individual papers remains with the authors. Guidelines for Americans recommend that adults engage in Copying permitted for private and academic purposes. at least 150 minutes a week of moderate-intensity physical HUMANIZE ’18, March 11, 2018, Tokyo, Japan Figure 1. Screenshots of the main tabs of the CalFit app are shown, including the (a) splash screen, (b) home tab, (c) history tab, and (d) contact tab. activity or 75 minutes a week of vigorous-intensity aerobic fitness apps through intelligent user interfaces [16, 21, 30, 48, physical activity [51, 55]. However, about 50% of adults in 52] has shown promise in promoting healthy behavior. Sim- the U.S. [15] are physically inactive. In fact, over 3 million ple heuristics , such as setting the future goal to be the 60th deaths worldwide are attributed to physical inactivity [54]. percentile of the steps taken in the past 10 days, has shown to be effective in promoting physical activity [1]. But few stud- Given the high prevalence of physical inactivity, it is necessary ies have investigated the potential of using more complicated to develop new cost-effective, scalable approaches to increase Machine Learning-based approaches to set personalized step physical activity. One promising direction is the use of smart- goals. phones in the delivery and personalization of programs that motivate individuals to increase their physical activity. Over In this paper, we introduce a novel fitness app on the iOS plat- 40% of adults worldwide and 77% of adults in the U.S. own form, CalFit, which automatically sets personalized, adaptive a smartphone [45]. Smartphones have powerful computation daily step goals and adopts behavior-change features such as and communication capabilities that enable the use of ma- self-monitoring. The daily step goals are computed using a chine learning and other data-driven analytics algorithms for reinforcement learning algorithm [5, 40] adapted to the con- personalizing the physical activity programs to each individ- text of physical activity interventions: Our app uses inverse ual. Furthermore, the past several generations of smartphones reinforcement learning to construct a predictive quantitative integrate reliable activity tracking features [2, 14, 18, 25], model for each user, and then uses this estimated model in con- which makes possible the real-time collection of fine-grained junction with reinforcement learning to generate challenging physical activity data from each individual. but realistic step goals in an adaptive fashion. We conducted a pilot study with 13 college students to demonstrate the efficacy Though many smartphone applications (apps) for fitness have of our app and the personalized adaptive step goal algorithm been developed, systematic reviews [8, 10, 39, 53] of mobile in promoting physical activity. fitness apps found an overall lack of persuasive attributes that are needed for the general public to maintain exercise moti- We first discuss related work and the theory of goal setting in vation through continued use of the app. These reviews [10, relation to behavior change. Next we describe the designed 53] also identified a lack of experimental validation for the elements. Our contributions toward the app design include efficacy of specific features implemented in mobile fitness translating elements and features from the theory of goal set- apps. For instance, recent studies [28, 36, 47] have shown ting into interface design choices for mobile fitness apps, as that constant step goals provided by existing apps and devices well as the design of a reinforcement learning algorithm that are ineffective in increasing physical activity and such a one- generates personalized step goals for users. Next we describe size-fits-all approach could even be harmful for some people. our contributions towards experimental validation of the effi- Therefore, maintaining user participation and motivation is cacy of our app design, through conducting the Mobile Student a core challenge in developing effective physical activity in- Activity Reinforcement (mSTAR) study. tervention platforms, and the personalization of goals within RELATED WORK lack of features that can effectively initiate and maintain the In this section, we review work on the intersection of mobile behavioral changes necessary to increase physical activity. technologies and behavior modification programs. First, we The low efficacy of current mobile fitness apps is due pri- describe key studies showing the efficacy of combining mo- marily to this lack of inclusion of important features based bile technologies with clinical coaching to increase physical on behavioral theory [8, 10, 39, 53]. Examples of key be- activity. Next, we describe behavior change features and their havior change features include: objective outcome measure- use in the design of mobile fitness apps. Finally, we survey ments, self-monitoring, personalized feedback, behavioral the theory of goal-setting. Identified weaknesses in existing goal-setting, individualized program, and social support. In apps and ideas on the theory of behavior change are used to particular, researchers recommend that self-monitoring should inform our design of the CalFit app. be conducted regularly and in real-time, so as to target activity with precise tracking information and emphasize performance Smartphone-based Clinical Trials successes. In addition, personalized feedback is most effective Physical activity interventions that involve multiple in-person when it is specific, such as in comparing current performance coaching sessions are costly and labor-intensive, and so re- to past accomplishments and previous goals. searchers have evaluated the feasibility and efficacy of lower- cost interventions where the number of coaching sessions are Goal Setting reduced (but not eliminated) in parallel with the introduction Goal setting is a critical factor for facilitating behavior change of mobile technologies (e.g., smartphone apps, digital pedome- [9, 37]. Prior studies using persuasive technology usually ters, activity trackers) [7, 11, 12, 13, 17, 19, 20, 23, 26, 29, 32, assigned a fixed goal to all participants (e.g., 10,000 steps per 44, 46]. These studies ranged in size from about 10 to several day) [3, 28, 36], but a fixed goal fails to capture the differences hundred participants. Both smartphones and personal digital between participants (different baseline physical activity level, assistants (PDA’s) were used to deliver these interventions, reaction to goals, etc). Conversely, personalized goal setting and the interface outputs were predominately text with some have the potential to increase the effectiveness of physical interventions involving simple graphic comparisons to goals. activity interventions. Simple heuristics, such as setting the These interventions featured different levels of interactivity, future goal to be the 60th percentile of the steps taken in the ranging from general weekly text messages to customized text past 10 days, has shown to be effective in promoting physical messages based on real-time monitoring of physical activity activity [1]. But few studies have investigated the potential of and other additional inputs. For instance, the mobile weight using more complicated Machine Learning-based approaches loss program in [11] used weekly input from overweight chil- to set personalized step goals. dren to send computer-generated text messages. Most studies In recent years, human-computer interaction (HCI) studies [7, 11, 12, 13, 17, 20, 23, 29, 32, 44] asked participants to have investigated interface design for goal-setting. Munson self-report dietary, weight, and exercise data. A smaller num- et al. [41] developed a smartphone app that implements pri- ber of studies [19, 26] have explored the use of automated mary (base) and secondary (stretch) weekly goals and found collection of exercise data either through accelerometer data that such a personalized goal-setting approach can be bene- that is wirelessly transmitted via Bluetooth to a smartphone ficial. However, the app lacks an explicit algorithm to help [26] or the use of digital pedometers to collect step data [19]. participants set “sweet spot” goals based on their past behav- Most of these studies had outcomes of a statistically signifi- ior. DStress [34] algorithmically sets daily goals based on cant decrease in weight or a statistically significant increase in previous performance, where if the daily goal is achieved for physical activity [7, 11, 13, 17, 20, 23, 29, 32, 44], supporting the day, then a more difficult goal is assigned for the following the potential advantages of mobile-based physical activity in- day and vice versa. Though this can effectively set adaptive terventions. However, none of these studies relied solely on goals, the goals for high variance targets (like steps) can be mobile technology. All of these studies involved in-person highly variate, which leads to reduced intervention impact. coaching sessions during the intervention (though the number For example, if a participant normally walks 8,000 steps but of coaching sessions was lower than in traditional behavior walks 1,000 steps on one day, then using the 1,000 value as modification programs) and either used objectively measured the baseline to set the step goal for the following day will outcomes using an additional device or self-reported outcomes. lead to a too-easy goal. A more comprehensive algorithm is The weight or exercise goals in these interventions were man- needed to incorporate all previous performance information ually set by the participants or the clinicians. to decide the “sweet spot” of future goals in a personalized fashion. In this paper, we describe a novel algorithm based Mobile Fitness Apps and Behavior Change Features on Reinforcement Learning that set goals ’smartly’ by first Mobile fitness apps have the potential to be a scalable way of learning the behaviors of each participant and then determines disseminating behavior change interventions in a cost-effective the most effective future goal in an adaptive fashion. manner. In addition to being able to deliver interventions through wireless internet and messaging connectivity, smart- THE CALFIT APP phones can also leverage in-built tools like GPS, digital ac- CalFit is a mobile fitness app that uses key behavior change celerometers, and cameras to objectively measure (as opposed features to improve effectiveness. It combines a personal- to self-reported data) health parameters. However, systematic ized goal setting algorithm and a structured interface with reviews [8, 10, 39, 53] of current mobile fitness apps found a regular self-monitoring and feedback to provide an adaptive Personalized Goal Personalized Goal Estimated Parameters Step and Step Data Goals Data Inverse Reinforcement SQL Reinforcement Learning Database Learning Behavioral Analytics Algorithm (BAA) Server Figure 2. The CalFit app interface uploads step data to a SQL database on a server, and the stored step and goals data is accessed by the Behavioral Analytics Algorithm (BAA) comprised of inverse reinforcement learning to estimate model parameters describing the user and followed by reinforce- ment learning to compute personalized step goals that will maximize the user’s future physical activity. The personalized step goals are stored in the SQL database and communicated to the user via the CalFit app interface. and individualized physical activity intervention. This sec- Behavioral Analytics Algorithm (BAA) tion discusses the design of the interface, communication, and Automated goal setting is a crucial component of the CalFit computation elements of our app, which are shown in Figure app. To set personalized goals that are challenging yet attain- 2. able for each user, we use a reinforcement learning algorithm [5, 40] that we have adapted to the context of physical activity interventions. The Behavioral Analytics Algorithm (BAA) uses inverse reinforcement learning to construct a predictive quantitative model for each participant, based on the historical Interface step and goal data for that user; then, it uses the estimated The CalFit app interface is built for the iOS platform. Upon model with reinforcement learning to generate challenging but opening the app interface, the user first sees the splash screen realistic step goals in an adaptive fashion. (Figure 1a) and then lands on the home tab (Figure 1b). On Below, we elaborate upon the mathematical formulations un- the home tab, the user can find his/her step goal for the day derlying these steps of BAA. Since the BAA algorithm does and the steps done so far today. The steps are tracked in calculations for each user independently of the calculations real-time using the built-in health chip on the iPhone and are for other users, our description of the algorithm (and accom- updated every 10-minutes. (Accuracy of step data collected by panying models) is focused on calculations for a single user. the built-in health chip on the iPhone and other smartphones has been validated by several studies [2, 14, 18, 25].) This Stage 0 – Predictive Model of User’s Step Activity design facilitates direct comparison between daily step goals Our predictive model is based on a model from [5, 40] for and objectively measured daily steps in order to enhance self- predicting weight loss based on steps and diet, and we have monitoring. adapted that model to the specific case of only predicting step There are two icons at the bottom of the home tab. If the left activity. Let the subscript t denote the value of a variable on icon on the home tab is clicked, the user is shown the history the t-th day of using the app, and define the function (x)− as tab (Figure 1c) that displays a barplot outlining the user’s  x, if x ≤ 0 performance in the past 7 days. The black lines on each bar (x)− = (1) represent the step goal, and the height of each bar represents 0, if x > 0 the actual measured steps. If the user achieved the goal, then the bar is green. If the user did not achieve the goal, then Our predictive model for the number of steps that the user the bar is red. This tab is designed to provide a quick, yet takes on the t-th day is comprehensive, visualization of the user’s past performance, allowing the user to quickly identify days of successes and ut = arg max −(u − ub )2 + pt · (u − gt )− , (2) u≥0 failures. If the right icon on the home tab is clicked, the user is directed to the contact tab (Figure 1d), where they can type where ut is the number of steps the user (subconsciously) de- in a message and send it to the research team regarding their cides to take, ub ∈ R+ is a parameter describing the user’s concerns, app bugs, etc. natural (or baseline) level of steps in a day, and pt ∈ R+ is a parameter that quantitatively characterizes the user’s respon- additive zero mean random variable εi . The study [5] found siveness to the goal gt ∈ R+ . that assuming εi has a Laplacian distribution led to an easily computable formulation and generated accurate predictions. The general idea of (2) is that users make decisions to max- imize their utility or happiness related to several objectives. Under the above setup, the inverse reinforcement learning The −(u − ub )2 term means a user has an ideal level of steps problem [6, 24, 35, 42] is equivalent to estimating the model they prefer to take in a day, wherein the user is implicitly trad- parameters ub , pt , µ. This problem can be formulated as a ing off a small number of steps in a day (and the dissatisfaction log-likelihood maximization [5, 40]. If we define H to be the accompanied by physical inactivity) with a large number of duration of the intervention, then we can write this estimation steps in a day (and the effort and time required to achieve many problem as a bilevel optimization problem steps). The parameter ub quantifies this baseline number of n steps that achieves this tradeoff for the user. The pt · (u − gt )− min ∑ uti − ũti term means a user gets increasing happiness the closer their i=1 steps are to the goal gt , and pt describes the rate of increase s.t. ut = arg max −(u − ub )2 + pt · (u − gt )− in happiness as the steps get closer to the goal; however, this u≥0 (4) model says that exceeding the goal results in no additional pt+1 = γ · pt + µ · 1 (ut ≥ gt ) happiness. A more complex model would include a term to describe an increase in happiness as the goal is exceeded, but 0 ≤ pt , µ ≤ UB p a detailed study [5] found that not including this additional 0 ≤ ut , ub ≤ UBu term still produced a model with high prediction accuracy. where the constraints hold for t = 1, . . . , H, and UB p , UBu are There is one additional component to our predictive model. constants that are upper bounds on the possible values. Exist- Equation (2) describes how a user decides the number of ing numerical optimization software is not able to solve the steps to take on the t-th day. The theory of goal setting [9, above problem, but we can rewrite it as a mixed-integer linear 37] recognizes that the effectiveness of goals can increase or program (MILP) [5, 40]. Let δ be a small positive constant, decrease over time, depending on the level of the goals and and M be a large positive constant. The above optimization whether or not an individual was able to meet the goals. To problem can be rewritten as the following MILP: quantify these effects, our predictive model includes nu min ∑ ati pt+1 = γ · pt + µ · 1 (ut ≥ gt ) , (3) i=1 where γ ∈ (0, 1) characterizes the user’s learned helplessness, s.t. − ati ≤ uti − ũti ≤ ati µ ∈ R+ quantifies the user’s self-efficacy, and 1(·) is the indi- ut = 12 (λ1,t + λ3,t ) + ub cator function. Self-efficacy is defined as a user’s beliefs in their capabilities to successfully execute courses of action, and 0 ≤ λ3,t ≤ pt it plays an essential role in the theory of goal setting [9, 37]. (gt − δ ) − Mx1,t ≤ ut ≤ gt − δ + M(1 − x1,t ) Self-efficacy influences a variety of health behaviors, includ- (gt − δ ) − M(1 − x2,t ) ≤ ut ≤ gt + δ + M(1 − x2,t ) ing physical activity [31, 38]. Though γ will be different for each individual, the past study [5] found that setting γ = 0.85 (gt + δ ) − M(1 − x3,t ) ≤ ut ≤ gt + δ + Mx3,t generated models with high prediction accuracy. pt − M · (1 − xt,1 ) ≤ λ3,t ≤ M · (1 − x3,t ) There are several points of intuition about (3). The term ut ≤ Myu,1 µ · 1 (ut ≥ gt ) describes the relationship between self-efficacy λ1,t ≤ M · (1 − yu,t ) and meeting goals. When a user achieves a goal, 1 (ut ≥ gt ) pt+1 ≥ γ · pt (5) is one and pt+1 increases by µ. Achieving a goal increases pt+1 ≤ γ · pt + M · (1 − xt,1 ) the user’s self-efficacy, leading to increased steps on future days. But if the user misses a goal, then 1 (ut ≥ gt ) is zero and pt+1 ≥ γ · pt + µ − M · xt,1 pt+1 does not increase. Not achieving a goal decreases the pt+1 ≤ γ · pt + µ user’s self-efficacy, leading to lower steps in the future. The xt+1,1 ≥ xt,1 − 1(gt+1 − gt < 0) term γ · pt describes the phenomenon whereby learned help- xt+1,2 ≤ xt,2 + 1(gt+1 − gt < 0) lessness reduces the utility or happiness an individual achieves for achieving goals. Consequently, (3) captures the interplay xt+1,3 ≤ xt,3 + 1(gt+1 − gt < 0) between increasing self-efficacy from meeting specific goals xt,1 + xt,2 + xt,3 = 1 with the decrease in self-efficacy from learned helplessness. yu,t , xt,1 , xt,2 , xt,3 ∈ {0, 1} Stage 1 – Inverse Reinforcement Learning λ1,t ≥ 0 The BAA algorithm first uses inverse reinforcement learning 0 ≤ pt , µ ≤ UB p to estimate the parameters ub , pt , µ in the predictive model 0 ≤ ut , ub ≤ UBu (2), (3) for a user. Denoting n measurements of the user’s step counts at times ti as ũti , for i = 1, . . . , n, our measurement where the constraints hold for t = 1, . . . , H and i = 1, . . . , n. model ũti = uti + εi is that the observed step counts ũti deviate The above MILP can be easily solved using standard optimiza- from the step counts chosen in the predictive model uti by an tion software [4, 22, 27]. Stage 2 – Reinforcement Learning regardless of whether or not the CalFit app interface is on; and Under our setup, the reinforcement learning problem [40, 49, if the push notification is clicked, it will lead to the homepage 50] for computing an optimal set of personalized goals for the of the app interface. The benefit of sending push notifications user is equivalent to performing a direct policy search using is two-fold: First of all, we want to constantly engage the users the estimated model parameters ûb , p̂0 , µ̂ computed by solving to implicitly remind them to continue using the app interface. (5). Adapting the solution in [40] to the current context of This is particularly important for fully automated physical ac- choosing an optimal sequence of step goals leads to a MILP: tivity interventions since users have a lower intention to adhere due to the lack of in-person coaching sessions. Secondly, the max umin congratulating push notifications can be seen as customized s.t. umin ≤ ut , for t > T assessment/feedback to users on their daily performance. − δ ≤ ut − ût ≤ δ , for t ≤ T Implementation Details − δ ≤ pt − p̂t ≤ δ , for t ≤ T The CalFit app consists of two parts: The interface of the ut = 12 (λ1,t + λ3,t ) + ûb iOS app (including push notifications) and the BAA dynamic 0 ≤ λ3,t ≤ pt goal setting algorithm. The backend of the CalFit app was implemented via the Parse API [43] running on an Intel Xeon (gt − δ ) − Mx1,t ≤ ut ≤ gt − δ + M(1 − x1,t ) E5-2650 v3 2.3GHz Turbo server with 16GB RAM. The server (gt − δ ) − M(1 − x2,t ) ≤ ut ≤ gt + δ + M(1 − x2,t ) was running CentOS 6.6, and the data was stored in an SQLite (gt + δ ) − M(1 − x3,t ) ≤ ut ≤ gt + δ + Mx3,t database on the same server. The BAA algorithm was written pt − M(1 − xt,1 ) ≤ λ3,t ≤ M(1 − x3,t ) in Python, and the MILP’s were solved using Gurobi [22]. The running time for BAA to recommend goals for a single user ut ≤ Myu,1 was less than one second on average, which is in line with the λ1,t ≤ M(1 − yu,t ) (6) benchmarks from [5] for personalizing a weight intervention. pt+1 = γ pt + µ̂(1 − x1,t ), for t > T THE mSTAR STUDY xt+1,1 ≥ xt,1 − gind,t , for t > T To experimentally evaluate the efficacy of the CalFit app and xt+1,2 ≤ xt,2 + gind,t , for t > T personalized goal setting using the BAA algorithm, we con- xt+1,3 ≤ xt,3 + gind,t , for t > T ducted the Mobile Student Activity Reinforcement (mSTAR) xt,1 + xt,2 + xt,3 = 1 study with college students in University of California, Berke- ley (UCB). The main research question was: Does setting gt+1 − gt ≤ M(1 − gind,t ), for t > T personalized step goals increase user’s steps compared to fixed gt+1 − gt ≥ −Mgind,t , for t > T step goals? The secondary research question was: Does set- yu,t , xt,1 , xt,2 , xt,3 , gind,t ∈ {0, 1}, for t > T ting personalized step goals improve adherence? The study λ1,t ≥ 0 was approved by the Committee for Protection of Human Sub- jects of the University of California, Berkeley (IRB Number 0 ≤ pt ≤ UB p 2016-03-8609) in July 2016. All participants provided written 0 ≤ ut ≤ UBu informed consent prior to study enrollment. where T is the current time, and the remaining constraints Methodology hold for t = 1, . . . , H and i = 1, . . . , n. The intuition is that To evaluate the above hypotheses, we designed the app so that the above MILP picks future goals in order to maximize the each user is randomly assigned to either the control group or smallest number of steps on any given day in the future, and the intervention group upon joining the study. Users in the the reason for this choice is that in our simulations we found control group received constant step goals of 10,000 steps that this objective function choice led to the largest increases everyday during the trial, whereas users in the intervention (as compared to other possible objective function choices) in group received personalized step goals computed by the BAA physical activity. Moreover, the above MILP can be easily algorithm. Both groups received the morning and evening solved using standard optimization software [4, 22, 27]. push notifications. Feedback via Push Notification Participants Using the BAA algorithm, the CalFit app is able to adaptively We recruited UCB students by sending email announcements set personalized step goals for users. To optimize the impact to departments. Recruitment started in January 2017 and of this goal-setting algorithm, we implemented feedback fea- ended in February 2017. Interested students were directed to tures via iOS push notifications. Each user receives at most complete an online survey to assess eligibility, and eligible two push notifications each day. The first push notification students were encouraged to sign-up for an in-person session is received by every user at 8:00am, and it notifies the users to complete enrollment in the study and install the app. The about their goal for the day. The second push notification at students were randomly assigned to either the control group 8:00pm is only received by users who successfully achieved or the intervention group upon installation of the app. their step goal for the day. Note the standard iOS push notifi- cation is used (i.e., appears in both the landing page and the The inclusion criteria was: being a full-time UCB student, recent notifications tab), and a user receives push notifications intent to become physically active, own an iPhone 5s or newer All Users (N=13) Control (N=7) Intervention (N=6) p-value Mean (± SD) Mean (± SD) Mean (± SD) Baseline daily average steps 6,163 ± 1,822 6,829 ± 2,023 5,387 ± 1,309 0.16 Age (years) 22.2 ± 2.9 21.6 ± 2.3 23.0 ± 3.5 0.40 Weight (kg) 70.4 ± 23.9 73.7 ± 31.9 66.5 ± 20.8 0.61 % (N) % (N) % (N) Gender 0.88 Male 23.1 (3) 14.3 (1) 33.3 (2) Female 76.9 (10) 85.7 (6) 66.6 (4) Ethnicity 0.85 Asian 23.1 (3) 28.6 (2) 16.7 (1) Hispanic/Latino 15.4 (2) 14.3 (1) 16.7 (1) White (non-Hispanic) 23.3 (3) 28.6 (2) 16.7 (1) Other 38.5 (5) 28.6 (2) 50.0 (3) Marital Status 1.00 Currently Married/Cohabitating 7.7 (1) 14.3 (1) 0.0 (0) Never Married 92.3 (12) 85.7 (6) 100.0 (6) Divorced/Widowed 0.0 (0) 0.0 (0) 0.0 (0) Year in School 0.52 Freshman 0.0 (0) 0.0 (0) 0.0 (0) Sophomore 15.4 (2) 28.6 (2) 0.0 (0) Junior 30.8 (4) 28.6 (2) 33.3 (2) Senior 23.1 (3) 14.3 (1) 33.3 (2) Graduate 30.8 (4) 28.6 (2) 33.3 (2) Own a Dog 1.00 Yes 7.7 (1) 14.3 (1) 0.0 (0) No 92.3 (12) 85.7 (6) 100.0 (6) Transportation to Work 0.43 Car 23.1 (3) 28.6 (2) 16.7 (1) Public Transportation 7.7 (1) 14.3 (1) 0.0 (0) Walk 61.5 (8) 42.9 (3) 83.3 (5) Bicycle 7.7 (1) 14.3 (1) 0.0 (0) Table 1. Comparison of Baseline Characteristics shows that the differences between participants in the control and intervention groups were not statistically significant, which is expected since participants were randomly assigned to groups. 8000 7000 Group Steps Control 6000 Intervention 5000 0 20 40 60 Day Figure 3. The objectively measured daily steps of the control group and the intervention group over the 10-week study period show the statistically significant difference in the number of daily steps at the end of the study. The plotted values are computed by averaging the raw data over each user in the corresponding group, adjusting the baseline value based on the value computed from the LMM model, and then smoothing the data using a standard (nonparametric) Nadaraya-Watson estimator. model, and willing to carry the iPhone during the study period. statistical analysis of the primary outcome of daily steps us- The exclusion criteria was: preexisting health conditions that ing a linear mixed-effects model (LMM) [31, 33, 38] with may make participation unsafe, having participated recently random effects for each individual of random slope and ran- in a physical activity or weight loss intervention, and regularly dom intercept, and fixed effects of time, intervention group, taking 20,000 steps in a day. We excluded students who took and interaction term of time and intervention group. This 20,000 steps per day because it is not possible to increase analysis found that the control group had a decrease in daily activity by using our app if they were at that activity level step count of 1520 (SD ± 740) steps between baseline and (since the BAA algorithm uses 20,000 steps as the upper bound 10-weeks, compared to an increase of 700 (SD ± 830) steps in for the goal), and the procedure was that students satisfying the intervention group. The difference in daily steps between the other criteria were enrolled and then excluded if 20,000 the two groups was 2220 (p = 0.039) with a 95% confidence steps was observed in the step data collected. interval of (100, 4480), which is a statistically significant dif- ference. The step goals computed by the BAA algorithm were Study Procedure on average between 6,000 steps and 8,000 steps. They varied Eligible users were required to attend two 15-minute in-person between different users and days resulting from its adaptive sessions (one at baseline and one at study conclusion). The and personalized nature. first in-person session occurred in January 2017 and the second occurred in May 2017. During the first in-person session, a Figure 3 shows the change in daily steps over the 10-week trained research staff member installed the CalFit app on users’ study period, and for fair comparison we baseline-adjusted the phones and advised them to carry the phone on their person plotted steps by adding the coefficient corresponding to each everyday during their participation in the study. The users group (i.e., control or intervention) computed by the LMM were randomly assigned to either the control group or the model. Despite the slightly higher steps in the intervention intervention group upon app installation. No other in-person group, the daily steps of the two groups did not differ substan- sessions were conducted during the study period to simulate a tially in the first 5 weeks. However, in the last 3 weeks, the fully smartphone application-based study environment, which intervention group had an average increase of 1,000 steps and is similar to the environment of most fitness apps. the control group had an average decrease of 2,000 steps. We suspect that we fail to see differences in the early weeks due to The users started a 1-week run-in period after the first in- the initial stimulation of participating in a fitness program. As person session. All users received identical daily step goals time went by, the excitement from participation cooled down of: 3000, 3500, 4000, 4500, 5000, 5500, and 6000 steps. This and the impact of the BAA algorithm started to dominate. We set of adaptive run-in steps goals was designed to engage the further defined adherent users to be those who used the Cal- users in using the application regularly. Also, the morning and Fit app for 80% of the days during the study period. Under the evening push notifications were sent to all eligible users. this criterion, 2 of the 7 users in the control group and 1 of Because the same step goals were provided to both the control the 6 users in the intervention group were identified as non- and intervention groups, we were able to collect run-in daily adherent. However, the difference in adherence percentage steps data when both groups received identical treatment. was not statistically significant (p = 0.61) between the two After the 1-week run-in period, the daily step goals for users in groups, primarily due to the small sample size. the control group (N=7) were set to 10,000 steps/day through the CalFit app, whereas the daily step goals for users in the Results of Qualitative Interview intervention group (N=6) were set by the BAA algorithm. During the second in-person session at 10-weeks, a trained The BAA algorithm was applied every week (to mitigate the research staff member interviewed the users on their experi- impact of large step variance), and it computes the step goals ence. All users agreed that the CalFit app was easy to navigate, for the following 7 days. Both groups received morning and required minimal effort on the user side, and the number of evening push notifications. The study lasted for 10-weeks, push notifications was about right. One user in the intervention and participants could earn up to a $25 Amazon gift card for group told us, “I am excited to know my step goal every morn- completing all parts of the study, including attending a final ing! I know I am doing well if my goal increases, and I know in-person session. I need to keep up when my goal decreases.” Another user in the control group, however, stated, “The goals are always RESULTS the same. It’s impossible for me to get that many steps so I Table 1 shows the baseline characteristics of the participants. stopped tracking.” The overall mean age was 22.2 (SD ± 2.9) years and 77% of the participants were female. The baseline mean daily step DISCUSSION in the control group was slightly higher than that in the inter- The mSTAR study reveals the potential of using personalized vention group, but the difference is not statistically significant step goals to facilitate physical activity. Interestingly, users’ (6,829 steps versus 5,387 steps, respectively; p = 0.16). The daily steps did not increase at a constant rate over the 10-week p-values in Table 1 were computed using t-tests for continuous period. Rather, we observe that the daily steps of the two variables and χ 2 -tests for categorical variables. groups did not differ significantly in the first 5 weeks. But in the last 3 weeks, the intervention group was taking many Physical Activity Outcomes more steps than the control group. We believe that in the first The primary outcome of the study is the objectively measured several weeks, physical activity was driven by users’ initial daily steps from baseline to 10-weeks. We conducted our enthusiasm with the start of their participation in the study. However, when this enthusiasm wore out after 5 weeks, we adults managing their chronic diseases). Another limitation observed significant difference in physical activity behavior is that the study lasted for only 10-weeks, so the long-term between the two groups, suggesting the potential of using the impact of the CalFit app is unclear. CalFit app (and its underlying features such as the automated generation of personalized step goals using reinforcement In the future, we would like to extend our observations fur- learning) to deliver physical activity interventions. ther by studying hypotheses in three directions. Firstly, how do different goal setting sources (i.e, self-set, trainer-set, and machine set) impact the intervention outcome? Secondly, how DESIGN IMPLICATIONS do different dynamic goal setting algorithms impact the in- There are two major challenges associated with providing fully tervention outcome? In particular, it would be beneficial to automated smartphone-based physical activity interventions. unveil if the success of this study is due to the BAA algorithm The first challenge is supporting users through key behavior or due to the fact that step goals are not steady. We would change features and effective goal-setting in order to increase like to compare the BAA algorithm to simpler analytical algo- their level of physical activity. The second challenge is en- rithms, such as, for example, setting the goal to be the 60th suring sustained maintenance of any increases in physical percentile of the steps in the past week. Thirdly, we would like activity initiated by an intervention. Typical physical activ- to isolate the impact of the various design features (i.e., push ity interventions address these challenges through frequent notification, history tab, etc.) to provide recommendations on in-person coaching sessions, which are effective in initiating the most effective features to future fitness app designers. and maintaining behavior change. Since in-person coaching is expensive, mobile physical activity interventions seek to lower CONCLUSION costs by reducing the amount of coaching. As a result, meeting We developed a novel fitness app called CalFit to track and these two challenges is substantially more difficult for fully deliver physical activity interventions. The app implements automated smartphone-based physical activity interventions. a reinforcement learning algorithm adapted to the context of The mSTAR study demonstrates the potential of adopting generating personalized and adaptive daily step goals for each behavior-change features and using personalization in mobile user so that the goals are challenging but attainable. Further- physical activity interventions to address these challenges. In more, the app adopts many behavior change features such particular, we found that sending one or two push notifica- as self-monitoring and customized feedback. A pilot study tions serves as a useful reminder. Furthermore, users prefer with 13 university students demonstrated that setting personal- apps that do not require too much time and effort. Features ized step goals resulted in 2,200 more daily steps than setting that require regular user input, such as setting personal goals steady step goals (of 10,000 steps/day) after 10 weeks. We or keeping a diary to record steps/food intake, can create a believe the CalFit app (and its underlying features like the au- burden on app adherence. Another main design choice is per- tomated generation of personalized step goals using reinforce- sonalization. The BAA algorithm that sets personalized step ment learning) has the potential to deliver physical activity goals for users is shown to be effective in increasing daily interventions in a fully automated fashion. A large scale, ran- steps. Providing challenging but yet attainable goals can in- domized controlled trial of a fully automated physical activity duce goal-achieving incentives, and giving daily feedback on intervention is warranted. performance (i.e., reminder push notification on daily goal and congratulating push notification) further reinforces exercise ACKNOWLEDGMENTS motivation. Conversely, fixed steps goals (10,000 steps/day) The authors would like to thank Emily Ma, Smita Jain, and Jes- with no personalization can be unrealistically high or too easy sica Lin for their help during the pre-study in-person session. to achieve and hinders users from progressing to be active. The authors would also like to thank Mingyang Li for his help in developing the app. This study was supported in part by Future designs of mobile fitness apps should consider person- funding from the Philippine-California Advanced Research In- alized interventions, including but not limited to goals, push stitutes (PCARI), funding from the UC Center for Information notifications, and displays. In addition, algorithms for goal- Technology Research in the Interest of Society (CITRIS), the setting should take the complete history of the user as the Philippine-California Advanced Research Institutes (PCARI) basis to generate future interventions, particularly when the grant IIID-2015-07, a grant (K24NR015812) from the Na- input and target metrics have high day-to-day variations. Im- tional Institute of Nursing Research, and a grant from the plementing behavior change features, such as self-monitoring National Center for Advancing Translational Sciences of the and summary feedback on performance, can further motivate National Institutes of Health (KL2TR000143). physical activity. Overall, the app should be easy to navigate and require minimum manual inputs from users, particularly by using algorithms to automate personalization. REFERENCES 1. Marc A Adams, James F Sallis, Gregory J Norman, Melbourne F Hovell, Eric B Hekler, and Elyse Perata. LIMITATIONS AND FUTURE WORK 2013. An adaptive physical activity intervention for One limitation of our study is the relatively small sample size. overweight adults: a randomized controlled trial. PloS A larger scale study should be performed to further confirm the one 8, 12 (2013), e82901. findings. In addition, the population of the study is university students, who may not be as concerned about their physical 2. Tim Althoff, Rok Sosič, Jennifer L Hicks, Abby C King, wellness as other populations (i.e., middle aged and elderly Scott L Delp, and Jure Leskovec. 2017. Large-scale physical activity data reveal worldwide activity inequality. 15. Centers for Disease Control and Prevention. 2017. Nature 547, 7663 (2017), 336–339. Exercise or Physical Activity. Technical Report. 3. James J Annesi. 2002. Goal-setting protocol in adherence 16. Sunny Consolvo, Predrag Klasnja, David W McDonald, to exercise by Italian adults. Perceptual and motor skills Daniel Avrahami, Jon Froehlich, Louis LeGrand, Ryan 94, 2 (2002), 453–458. Libby, Keith Mosher, and James A Landay. 2008. 4. MOSEK ApS. 2016. The MOSEK optimization toolbox Flowers or a robot army?: encouraging awareness & for MATLAB manual. Version 7.1. activity with personal, mobile displays. In Proceedings of http://docs.mosek.com/7.1/toolbox/index.html the 10th international conference on Ubiquitous computing. ACM, 54–63. 5. Anil Aswani, Philip Kaminsky, Yonatan Mintz, Elena Flowers, and Yoshimi Fukuoka. 2016. Behavioral 17. Brianna S Fjeldsoe, Yvette D Miller, and Alison L modeling in weight loss interventions. (2016). Marshall. 2010. MobileMums: a randomized controlled https://ssrn.com/abstract=2838443 trial of an SMS-based physical activity intervention. Annals of Behavioral Medicine 39, 2 (2010), 101–111. 6. Anil Aswani, Zuo-Jun Max Shen, and Auyon Siddiq. 2018. Inverse optimization with noisy data. Operations 18. Yuichi Fujiki. 2010. iPhone as a physical activity Research (2018), To appear. measurement platform. In CHI’10 Extended Abstracts on 7. Audie A Atienza, Abby C King, Brian M Oliveira, Human Factors in Computing Systems. ACM, David K Ahn, and Christopher D Gardner. 2008. Using 4315–4320. hand-held computer technologies to improve dietary 19. Yoshimi Fukuoka, Caryl L Gay, Kevin L Joiner, and Eric intake. American journal of preventive medicine 34, 6 Vittinghoff. 2015. A novel diabetes prevention (2008), 514–518. intervention using a mobile app: a randomized controlled 8. Kristen MJ Azar, Lenard I Lesser, Brian Y Laing, Janna trial with overweight adults at risk. American journal of Stephens, Magi S Aurora, Lora E Burke, and Latha P preventive medicine 49, 2 (2015), 223–237. Palaniappan. 2013. Mobile applications for weight 20. Yoshimi Fukuoka, Eric Vittinghoff, So Son Jong, and management: theory-based content analysis. American William Haskell. 2010. Innovation to motivation – pilot journal of preventive medicine 45, 5 (2013), 583–589. study of a mobile phone intervention to increase physical 9. Albert Bandura. 1991. Social cognitive theory of moral activity among sedentary women. Preventive medicine 51, thought and action. Handbook of moral behavior and 3 (2010), 287–289. development 1 (1991), 45–103. 21. Dominik Gall, Jean-Luc Lugrin, Dennis Wiebusch, and 10. Marco Bardus, Samantha B van Beurden, Jane R Smith, Marc Erich Latoschik. 2016. Remind Me: An Adaptive and Charles Abraham. 2016. A review and content Recommendation-Based Simulation of Biographic analysis of engagement, functionality, aesthetics, Associations. In Proceedings of the 21st International information quality, and change techniques in the most Conference on Intelligent User Interfaces. ACM, popular commercial apps for weight management. 191–195. International Journal of Behavioral Nutrition and 22. Gurobi Optimization, Inc. 2016. Gurobi Optimizer Physical Activity 13, 1 (2016), 35. Reference Manual. (2016). http://www.gurobi.com 11. Stephanie Bauer, Judith de Niet, Reinier Timman, and Hans Kordy. 2010. Enhancement of care through 23. Irja Haapala, Noël C Barengo, Simon Biggs, Leena self-monitoring and tailored feedback via text messaging Surakka, and Pirjo Manninen. 2009. Weight loss by and their use in the treatment of childhood overweight. mobile phone: a 1-year effectiveness study. Public health Patient education and counseling 79, 3 (2010), 315–319. nutrition 12, 12 (2009), 2382–2391. 12. Jeannette M Beasley, William T Riley, Amanda Davis, 24. Dylan Hadfield-Menell, Stuart J Russell, Pieter Abbeel, and Jatinder Singh. 2008. Evaluation of a PDA-based and Anca Dragan. 2016. Cooperative inverse dietary assessment and intervention program: a reinforcement learning. In Advances in neural randomized controlled trial. Journal of the American information processing systems. 3909–3917. College of Nutrition 27, 2 (2008), 280–286. 25. Eric B Hekler, Matthew P Buman, Lauren Grieco, Mary 13. Lora E Burke, Molly B Conroy, Susan M Sereika, Rosenberger, Sandra J Winter, William Haskell, and Okan U Elci, Mindi A Styn, Sushama D Acharya, Abby C King. 2015. Validation of physical activity Mary A Sevick, Linda J Ewing, and Karen Glanz. 2011. tracking via android smartphones compared to ActiGraph The effect of electronic self-monitoring on weight loss accelerometer: laboratory-based and free-living and dietary intake: a randomized behavioral weight loss validation studies. JMIR mHealth and uHealth 3, 2 trial. Obesity 19, 2 (2011), 338–344. (2015). 14. Meredith A Case, Holland A Burwick, Kevin G Volpp, 26. Robert Hurling, Michael Catt, Marco De Boni, and Mitesh S Patel. 2015. Accuracy of smartphone Bruce William Fairley, Tina Hurst, Peter Murray, applications and wearable devices for tracking physical Alannah Richardson, and Jaspreet Singh Sodhi. 2007. activity data. JAMA 313, 6 (2015), 625–626. Using internet and mobile phone technology to deliver an automated physical activity program: randomized 38. Edward McAuley and Bryan Blissmer. 2000. controlled trial. Journal of medical Internet research 9, 2 Self-efficacy determinants and consequences of physical (2007). activity. Exerc Sport Sci Rev 28, 2 (2000), 85–88. 27. IBM. 2016. IBM ILOG CPLEX Optimization Studio. 39. Kathryn Mercer, Melissa Li, Lora Giangregorio, (2016). Catherine Burns, and Kelly Grindrod. 2016. Behavior 28. John M Jakicic, Kelliann K Davis, Renee J Rogers, change techniques present in wearable activity trackers: a Wendy C King, Marsha D Marcus, Diane Helsel, Amy D critical analysis. JMIR mHealth and uHealth 4, 2 (2016). Rickman, Abdus S Wahed, and Steven H Belle. 2016. 40. Yonatan Mintz, Anil Aswani, Philip Kaminsky, Elena Effect of wearable technology combined with a lifestyle Flowers, and Yoshimi Fukuoka. 2017. Behavioral intervention on long-term weight loss: the IDEA Analytics for Myopic Agents. (2017). randomized clinical trial. Jama 316, 11 (2016), http://arxiv.org/abs/1702.05496 1161–1171. 41. Sean A Munson and Sunny Consolvo. 2012. Exploring 29. Nam-Seok Joo and Bom-Taeck Kim. 2007. Mobile phone goal-setting, rewards, self-monitoring, and sharing to short message service messaging for behaviour motivate physical activity. In Pervasive computing modification in a community-based weight control technologies for healthcare (PervasiveHealth), 2012 6th programme in Korea. Journal of Telemedicine and international conference on. IEEE, 25–32. Telecare 13, 8 (2007), 416–420. 42. Andrew Y Ng, Stuart J Russell, and others. 2000. 30. Rosemary Josekutty Thomas, Judith Masthoff, and Nir Algorithms for inverse reinforcement learning.. In Icml. Oren. 2017. Personalising Healthy Eating Messages to 663–670. Age, Gender and Personality: Using Cialdini’s Principles and Framing. In Proceedings of the 22nd International 43. Parse. 2016. Parse Documentation. (2016). Conference on Intelligent User Interfaces Companion. http://docs.parseplatform.org ACM, 81–84. 44. Kevin Patrick, Fred Raab, Marc A Adams, Lindsay 31. Julie J Keysor. 2003. Does late-life physical activity or Dillon, Marian Zabinski, Cheryl L Rock, William G exercise prevent or minimize disablement?: a critical Griswold, and Gregory J Norman. 2009. A text review of the scientific evidence. American journal of message–based intervention for weight loss: randomized preventive medicine 25, 3 (2003), 129–136. controlled trial. Journal of medical Internet research 11, 1 32. Abby C King, David K Ahn, Brian M Oliveira, Audie A (2009). Atienza, Cynthia M Castro, and Christopher D Gardner. 45. Pew Research Center. 2016. Smartphone Ownership and 2008. Promoting physical activity through hand-held Internet Usage Continues to Climb in Emerging computer technology. American journal of preventive Economies. Technical Report. medicine 34, 2 (2008), 138–142. 46. William T Riley, Daniel E Rivera, Audie A Atienza, 33. Joseph A Knight. 2012. Physical inactivity: associated Wendy Nilsen, Susannah M Allison, and Robin diseases and disorders. Annals of Clinical & Laboratory Mermelstein. 2011. Health behavior models in the age of Science 42, 3 (2012), 320–337. mobile interventions: are our theories up to the task? 34. Artie Konrad, Victoria Bellotti, Nicole Crenshaw, Simon Translational behavioral medicine 1, 1 (2011), 53–71. Tucker, Les Nelson, Honglu Du, Peter Pirolli, and Steve Whittaker. 2015. Finding the adaptive sweet spot: 47. Sarah Knapton. 2017. The 10,000 Steps a Day Myth: Balancing compliance and achievement in automated How Fitness Apps Can Do More Harm Than Good. stress reduction. In Proceedings of the 33rd Annual ACM (2017). http://www.telegraph.co.uk/news/2017/02/21/ 10000-steps-day-myth-fitness-apps-can-do-harm-good/ Conference on Human Factors in Computing Systems. ACM, 3829–3838. 48. Oliver S Schneider, Karon E MacLean, Kerem Altun, 35. Sanjay Krishnan, Animesh Garg, Richard Liaw, Lauren Idin Karuei, and Michael Wu. 2013. Real-time gait Miller, Florian T Pokorny, and Ken Goldberg. 2016. Hirl: classification for persuasive smartphone apps: structuring Hierarchical inverse reinforcement learning for the literature and pushing the limits. In Proceedings of long-horizon tasks with delayed rewards. (2016). the 2013 international conference on Intelligent user http://arxiv.org/abs/1604.06508 interfaces. ACM, 161–172. 36. Hyunho Lee and Youngseok Lee. 2017. A Look at 49. Richard S Sutton and Andrew G Barto. 1998. Wearable Abandonment. In Mobile Data Management Reinforcement learning: An introduction. MIT press (MDM), 2017 18th IEEE International Conference on. Cambridge. IEEE, 392–393. 50. Richard S Sutton, David A McAllester, Satinder P Singh, 37. Edwin A Locke and Gary P Latham. 2002. Building a and Yishay Mansour. 2000. Policy gradient methods for practically useful theory of goal setting and task reinforcement learning with function approximation. In motivation: A 35-year odyssey. American psychologist Advances in neural information processing systems. 57, 9 (2002), 705. 1057–1063. 51. U.S. Department of Health and Human Services. 2008. Physical Activity Guidelines for Americans. (2008). https://health.gov/paguidelines/pdf/paguide.pdf 52. Youri van Pinxteren, Gijs Geleijnse, and Paul Kamsteeg. 2011. Deriving a recipe similarity measure for recommending healthful meals. In Proceedings of the 16th international conference on Intelligent user interfaces. ACM, 105–114. 53. Ted Vickey, John Breslin, and Antonio Williams. 2012. Fitness–There’s an App for That: Review of Mobile Fitness Apps. International Journal of Sport & Society 3, 4 (2012). 54. World Health Organization. 2017a. Health topics: physical activity. (2017). http://www.who.int/topics/physical_activity/en/ 55. World Health Organization. 2017b. Physical activity fact sheet. (2017). http://www.who.int/mediacentre/factsheets/fs385/en/ 56. World Heart Federation. 2017. Physical inactivity. (2017). http://www.world-heart-federation.org/ cardiovascular-health/ cardiovascular-disease-risk-factors/ physical-inactivity/