=Paper=
{{Paper
|id=Vol-2148/paper3
|storemode=property
|title=Detecting Behaviour Changes in Accelerometer Data
|pdfUrl=https://ceur-ws.org/Vol-2148/paper03.pdf
|volume=Vol-2148
|authors=Claudio Diazand,Kalina Yacef
|dblpUrl=https://dblp.org/rec/conf/ijcai/DiazandY18
}}
==Detecting Behaviour Changes in Accelerometer Data==
Detecting Behaviour Changes in Accelerometer Data Claudio Diaz, Kalina Yacef School of Information Technologies The University of Sydney cdia0348@uni.sydney.edu.au, kalina.yacef@sydney.edu.au Abstract provide insights on how to improve their effectiveness [Krebs et al., 2010]. With increasingly available wearable technolo- How can the impact of Health Education programs gies, researchers more routinely use sensors for measuring promoting physical activity be analysed? One com- physical activity unobtrusively and continuously [Plasqui et mon way with learning programs is to conduct al., 2013]. Accelerometers provide objective, continuous data pre- and post-tests and measure whether/how tar- of real daily life physical activity, replacing or complement- get knowledge has evolved. In the case of phys- ing self-reported data (often inaccurate and coarse). This is ical activity, unobtrusive accelerometers can cap- especially important when studying children because their ture detailed data about people’s movements, but self-reported data and/or parent reports can be very inaccu- the challenge is to extract information from these rate [Kelly et al., 2007]. raw data to investigate whether/how physical activ- Whilst the most frequent use of accelerometers in Health ity behaviours have evolved. This paper presents a Education is to quantify physical activity, much deeper in- methodology to do so, by extracting bouts of phys- formation can be captured from their data, such as activity ical activity of specific intensity levels and of vari- recognition [Ravi et al., 2005] and changes in everyday phys- ous lengths, and by using these as features to cluster ical activity [Sprint et al., 2016]. Detecting changes in learn- students’ daily behaviours before and after inter- ing behaviour is not new: Specialised data science fields such vention. This approach enables a more insightful as Educational Data Mining (EDM) [Baker and Yacef, 2009] analysis of the physical activity behaviours of the and Learning Analytics and Knowledge (LAK)[Siemens, participants, and point to the nature of behaviour 2013] have developed techniques to extract learning be- changes, if present. We illustrate this methodology haviour changes which can certainly be explored for Health with pre- and post-test data collected in the context Education contexts using accelerometer data. There is indeed of an e-learning program aimed at educating school an emerging interest in using sensors to better understand children about healthy behaviours, with a focus complex behaviours in education: for example, in learning on reaching recommended levels of daily physi- kinaesthetic skills like martial arts, dancing or use of clini- cal activity: the pre- and post-tests were carried cal equipment [Martinez-Maldonado et al., 2017], or some- out by measuring unobtrusively and continuously times using several sensors such as, for example, in the anal- their physical activity for five consecutive school ysis of hand movements for engineering building activities days using research-grade accelerometers (GENE- [Worsley, 2014], leading to the added complexity of deal- Activ). ing with multimodal data sources [Ochoa, 2017] requiring the creation of different analytics and data mining techniques to 1 Introduction and Related Work extract meaningful information from multi-sensor data [Blik- Obesity and sedentarity in children has increased in the last stein and Worsley, 2016]. However the techniques for ex- three decades [Ng et al., 2014]. In order to reverse this trend, tracting learning-useful information from sensor data are still countries and organisations worldwide implement health ed- in infancy. ucation programs for seniors, adults and children, in order In this paper we are concerned with modelling and com- to promote behaviour changes and raise awareness with re- paring physical activity behaviours between two sets of ac- gards to diet and physical activity, two major factors linked to celerometer data, captured before and after a learning inter- obesity and non-communicable diseases. In particular, stud- vention, in order to understand its impact. The contribution of ies suggest that physical activity is positively associated with this paper is a clustering-based approach for a more insightful many health benefits, and that in children should accumulate analysis of the physical activity behaviour of the participants, at least 60 mins per day of moderate to vigorous physical ac- and of the nature of physical activity behaviour changes, if tivity [Janssen and LeBlanc, 2010]. present. The paper is structured as follows. Section 2 presents The use of technology in health promotion interventions our data and its context. Section 3 describes the methodol- has shown great potential to improve health behaviours and ogy, and Section 4 presents the results of this approach on our dataset. Section 5 concludes the paper and suggests avenues ements of how the PA is distributed throughout the day. In- of future work. deed, two days (for 2 different children, or 2 days for the same child) can show the same total quantity of MVPA (e.g. 2 Data and Overall Analysis 40 minutes), but one will contain a lot of sedentary time and long sessions of MVPA, whilst another can show more bro- The data was collected from the iEngage project [Yacef et ken down MVPA but less sedentary time (hence more light al., 2018]: iEngage aims to educate 10-13 year old school activity). The idea is to be able to identify the types of distri- children about healthy behaviours, with a focus on reach- butions of activity that are present in the cohort data, and to ing recommended levels of daily Physical Activity (PA). PA distinguish these distributions. can fall into one of four different categories: sedentary time (therefore absence of physical activity), light, moderate and Accelerometer Data vigorous PA. The recommendations are that children should do at least 60 minutes of moderate to vigourous PA (short- SVMgs ened to MVPA). The elearning program also raises aware- ness about sedentary time, encourages children to limit it, Daily Sequences of PA Intensities and break them up on a regular basis by some light activity at least. As shown in Figure 1, we conducted a controlled study PA Bouts Features with two groups of children. The experimental group fol- lowed the iEngage learning sessions over 5 weeks, whilst the PA Daily Behaviour Clustering control group did not. Pre and post-tests were carried out on both groups measuring unobtrusively and continuously their Figure 2: Methodology physical activity with GENEActiv [Activinsights Ltd., 2017] Our methodology, shown in Figure 2, can be summarised activity trackers for five consecutive school days. as follows. First, we processed and categorised students’ raw GENEActiv accelerometer data into sequences of PA inten- Control (N=26) Pre Post sity levels for both datasets (pre and post intervention). We then extracted the bouts of PA, and used their characteris- Experimental (N=35) Pre iEngage Post tics as features for clustering all the data, to identify types of daily PA behaviours. As we will show in section 4, these, we Figure 1: High-level protocol of the intervention were then able to follow students’ movements across these clusters before and after the learning intervention. The next The GENEActiv accelerometers were worn on the wrist of sub-sections will detail these steps. their non-skilled hand and captured acceleration in three axes (x,y,z) with a sample frequency of 60Hz. At the end of each 3.1 Data Pre-processing 5 day period (pre and post, for each group), the GENEActiv The data pre-processing was done using R [Ihaka and Gen- trackers were collected and their data downloaded to a com- tleman, 1996], which has a specific library to manipulate GE- puter, hence generating two five-day datasets per child, for a NEActiv trackers data [Fang and Langford, 2013]. From this total of 61 children. point onward, as we are interested in analysing the changes in Overall analysis of the sum of minutes spent in PA showed the experimental population, we worked with the data from that pre-intervention, the control and experimental groups the experimental group (N=35). First, we converted the ac- spent similar time doing PA at each intensity (p-values of celerometer binary files to data frames. Next, as we focus 0.63, 0.62,0.76, 0.29 for Sedentary, Light, Moderate and Vig- here on daily PA behaviours, we filtered out the sleeping orous intensities respectively). However, the experimental times, thus extracting 12-hour daytime records (from 8:00 to group post intervention did significantly more PA, especially 20:00 hrs). To ensure that the daily records were all com- in MVPA levels (p-values of 0.12, 0.003, 0.017 respectively parable, weexcluded days where the tracker was not used the for L, M and V). While this is consistent with the interven- whole day, thus excluding the Monday and Friday which were tion reaching the desired effect (at least short term) on this incomplete. DUe to absence or sickness,not all children wore population, we are seeking to get more insights on how this the trackers before and after the intervention. Therefore, from activity is distributed throughout the day, and how it evolved: the initial 35 children in the experimental group, we ended up for instance, an important question is whether the additional with 30 pre intervention children with three daytime records MVPA occurred in longer bouts of activity (which would sug- and 24 post intervention children with three daytime records, gest more sustained intentional activity), or was it scattered in thus 54 (30+24) three-day PA records all up. minuscule amounts throughout the day (which is more likely to be more incidental)? This led us to explore bouts of PA in 3.2 From Accelerometer to SVMgs terms of intensity level, length and frequency. The next step translated the three dimensional 60 Hz accel- eration data into quantities of physical activity within a 1 3 Methodology for Extracting Daily Physical second epoch. We took the data frames from the binaries and extracted the triaxial acceleration records with times- Activity Behaviours tamps of every child to calculate gravity-subtracted Signal We devised a methodology for characterising daily be- Vector Magnitudes (SVMgs) [Esliger et al., 2011], with grav- haviours of PA at a coarse level, yet capturing essential el- ity approximated to 1, for each 1 second epoch (see Formula 1). This process produced a long vector of physical activity bout of MVPA and sedentary times. Let us introduce some SVMgs per second for each child over the 3 days, thus 54 definitions. vectors in total, each being 129,600 second long (3 days x 12 • A bout is a continuous episode of physical activity at a hours x 60 minutes x 60 seconds). specific range of intensity level. 60 X √ • The length of a bout is the number of seconds spent SV M gs = | xi + yi + zi − 1| (1) during that bout. i=1 • The bout frequency is the number of occurrences of all bouts of a certain length during a day. 3.3 From SVMgs to PA Intensity Levels We then categorised the SVMgs at each second of data into We focused on bouts in the range of Moderate to Vigorous a PA intensity level, using cutoffs scientifically validated for Physical Activity (MVPA) and Sedentary Activity (SED), as assessment of physical activity intensity in children [Phillips the aim of the health program is to increase MVPA and de- et al., 2013]. These cutoffs are shown in Table 1. crease SED. Therefore we merged M and V into one category ”MVPA”. For instance, a sequence of 11 seconds spent in M, Table 1: SVMgs Cut Off Levels 8 seconds in V, and 12 seconds in M preceded and followed by L’s would generate one bout of MVPA that would be 31 Physical Activity Intensity Levels SVMgs Cut Off Sedentary [0, 4.5[ seconds long. Light [4.5, 16.5[ One of the first questions we explored was: was the in- Moderate [16.5, 42[ creased MVPA that was observed overall after the interven- Vigorous ≥ 42 tion done in longer bouts? As a first step, we analysed the total time spent in MVPA done in bouts of at least x seconds. Figure 3 displays an example of SVMgs time series over Formula 2 shows the reverse cumulative sequence, where t is one day for one student. The red horizontal line represents the bout threshold and b is the number of seconds spent in the cutoff from sedentary to light, the blue line the cutoff from bouts of length of at least t. For t=1, this is equivalent to the light to moderate, and the green line the cutoff from moderate total number of seconds spent in MVPA. For t=2, the total to vigorous. number of seconds spent in bouts of at least 2 seconds (there- fore excluding the 1 second-long bouts), and so on. X n Bouts Cum Sumt = bi (2) i=t Figure 4 shows a sample of the result of these calculations, where every line shows the average daily MVPA cumulative bout length for a particular student. Over 10 seconds the lines start to flatten as bout length increases. Figure 3: SVMgs time series of one student over one day (the figure is truncated between 80-200 SVMgs for better presentation clarity) Using the cutoffs above, each second was coded as follows: S for sedentary time, L for light PA, M for moderate PA and V for vigorous PA. As an example, a piece of 5 seconds length of this string can be LLLVV, which can be read as 3 seconds of light activity followed by 2 seconds of vigorous activity. This step therefore produced 54 strings of 129,600 characters, where each character represents the PA intensity level for one Figure 4: Reverse Cumulative Bout Lengths second of PA. A paired T-Test on the before and after cumulative series 3.4 Bouts of PA reveals that overall, students increased MVPA bouts length As mentioned earlier, we are interested in assessing daily (p-value=6.883e-10), increased MVPA bout frequency (p- PA behaviours by looking at how their MVPA and sedentary value=0.007814), decreased SED bout length (p-value=2.2e- times are distributed throughout the day. Therefore we chose 16) and decreased SED bout frequency (p-value=2.2e-16). to explore the intensity level, length and frequency of each This therefore suggests an overall positive effect of the learn- ing program. 3.5 Clustering of PA Behaviours Table 3: Clusters Centroids To explore how students changed their PA patterns before and MVPA Inten. Measure 1 (N=8) 2 (N=12) 3 (N=5) 4 (N=14) 5 (N=9) 6 (N=6) after, we averaged the daily behaviours of the children pre- >= 3 Secs Tot. Time (min) 32.6 31.4 49.2 61.8 65.8 92.3 and post-intervention and clustered these average daily be- Num. of Bouts 352.6 308.7 472.4 605.3 594.7 698.4 haviours using bout characteristics as features: the average >= 10 Secs Tot. Time (min) 9.5 11.5 18.7 22.1 27.9 49.3 time per day spent in bouts of at least a specific length and Num. of Bouts 35.7 40.7 66.1 81.6 97.2 143.6 the average frequency of bouts per day. We selected the daily >= 30 Secs Tot. Time (min) 1.3 2.1 4.0 3.7 5.8 18.2 thresholds of MVPA and SED bouts not only based on our Num. of Bouts 1.8 3.2 5.5 5.4 8.5 22.4 exploration above but also following the established litera- SED Inten. Measure 1 (N=8) 2 (N=12) 3 (N=5) 4 (N=14) 5 (N=9) 6 (N=6) ture [Schaefer et al., 2014]. In particular, meaningful MVPA >= 60 Secs detected by GENEactivs starts at 3 seconds, as any shorter ac- Tot. Time (min) 136 226.3 217.9 70.6 118.4 96.6 Num. of Bouts 63.9 76.8 39.7 39.6 61.8 41.9 tivity is likely to be noise. The thresholds are shown in Table >= 120 Secs 2. Tot. Time (min) 73.2 156.1 179.8 29.8 59.8 57.5 Num. of Bouts 17.8 24.9 11.2 9.1 18.3 12.6 >= 300 Secs Table 2: Clustering Features Tot. Time (min) 29.8 101.1 158.2 7.0 14.3 26.8 Num. of Bouts 2.2 6 3.7 0.8 1.9 1.8 Physical Activity Intensity Bouts Threshold MVPA 3,10,30 SED 60,120,300 Given these observations, we ordered the clusters in in- creasing level of PA behaviour, from the lowest activity stu- Using these features, we generated daily PA behaviour dent cluster (C1) to the highest activity one (C6), and charac- clusters with all the 54 three-day long records (30 pre- terise them as seen in Table 4. intervention + 24 post-intervention). This means children can be present in up to 2 clusters: one from their daily PA be- Table 4: Cluster Descriptions (Those meeting the daily recommen- haviour before the intervention, and the other from their PA dation of MVPA are flagged with *) behaviour after the intervention. Of course, both their PA be- haviours could fall into the same cluster. The features were Cluster Summary Description standardised and a k-means unsupervised algorithm [Mac- 1 Not very active cluster (Half of MVPA recommended amounts) but average amount of sedentary times queen, 1967] with k=6 was applied. This number of clusters 2 Not very active cluster (A little over half of MVPA was determined by analysing when including another clus- recommended amounts) combined with high amount ter does not improve enough the total within-cluster sum of of sedentary time but broken down in many bouts square (see Figure 5). 3 Fairly low MVPA (11 mins short of recommended levels) and very high amount of long sedentary bouts 4* Active cluster (meeting the recommended amounts of MVPA) combined with little sedentary time, and even fewer long sedentary bouts 5* Active cluster, slightly more MVPA than cluster 4 but contrasted with higher amounts of short sedentary bouts, and reasonable long bouts of sedentary time 6* Active cluster, with highest amount of MVPA and low sedentary bouts, but more longer sedentary bouts than the 2 other active clusters. 4 Behaviour Change Figure 5: Total within-cluster sum of square by cluster The clusters above capture the daily behaviours for all chil- dren, before and after, with regards to MVPA and sedentary The cluster centroids are shown in Table 3. We can see times. We can now look at whether and how the children that, from a MVPA point of view, the centroids of clusters from the experimental population moved from one cluster to C4, C5 and C6 fulfil the minimum recommendation of 60 another, or stayed in the same cluster, as this can be a sign of minutes daily of MVPA [Janssen and LeBlanc, 2010], but behaviour change. We can do so only for those children who those of C1, C2 and C3 do not. Also, from a SED point of wore the GENEactivs in both periods (N=22). view we can see that C2, C3 and C1 has the longest and more Table 5 shows the movement matrix between daily PA be- frequent SED. In detail, C1 shows the lowest medium/long haviour clusters before and after the intervention. The green MVPA and the third highest short SED, C2 shows the lowest area shows the top desirable moves (from a low PA cluster short bouts of MVPA, longest short SED, C3 shows the third to a higher PA cluster), light green shows acceptable moves lowest short MVPA and the second highest short SED, C4 (from any PA cluster that already meets the daily recommen- shows third highest short MVPA and the lowest short SED, dations to any cluster that also meets them). Yellow shows C5 shows the second highest MVPA and the third lowest SED unimproved moves (from a low PA cluster to a similar PA and finally C6 shows the highest MVPA and the second low- cluster), and red area shows undesirable moves (from a high est short SED. PA cluster to a low PA one, or from a low PA one to an even lower PA one). Table 5: Cluster movement matrix Acknowledgements To Cluster This project was funded by Diabetes Australia Research 1 2 3 4 5 6 Trust. We acknowledge all the iEngage team. C. Diaz thanks 1 0 2 0 1 1 0 Universidad Adolfo Ibáñez for their support. 2 1 2 1 0 1 0 3 0 0 0 1 1 1 From Cluster 4 0 1 0 4 0 0 References 5 0 0 0 1 2 0 [Activinsights Ltd., 2017] Activinsights Ltd. GENEActiv 6 0 0 0 0 0 2 Original - Wrist-Worn Actigraphy Device — GENEActiv Accelerometers, 2017. We observe four different behaviour changes, [Baker and Yacef, 2009] Ryan S.J.D. Baker and Kalina • Children that moved to clusters with higher MVPA (9 Yacef. The State of Educational Data Mining in 2009 : A children). Review and Future Visions. Journal of Educational Data Mining, 1(1):3–16, 2009. • Children that moved to clusters with lower MVPA clus- ters (3 children). [Blikstein and Worsley, 2016] Paulo Blikstein and Marcelo Worsley. Multimodal learning analytics and education • Children who were already in a cluster with MVPA data mining: using computational technologies to measure above the daily recommended guidelines, and remained complex learning tasks. Journal of Learning Analytics, in the same high MVPA cluster (8 children) 3(2):220–238, 2016. • Children who were in a cluster that did not meet the [Esliger et al., 2011] Dale W. Esliger, Ann V. Rowlands, recommended guidelines of MVPA and remained in the Tina L. Hurst, Michael Catt, Peter Murray, and Roger G. same one (2 children) Eston. Validation of the GENEA accelerometer. Medicine In particular we can see that over half of the students who and Science in Sports and Exercise, 43(6):1085–1093, were in the cluster with the least MVPA (C1) have moved up 2011. to more active clusters, and that all the students who were [Fang and Langford, 2013] Zhou Fang and Maintainer Joss in the average/fairly low MVPA cluster (C3) have moved to Langford. Package ‘ GENEAread ’, 2013. more active clusters. Students who were already active (in C4, C5 and C6) remained active, except for one student who [Ihaka and Gentleman, 1996] Ross Ihaka and Robert Gentle- became more sedentary (moved to C2). man. Interface Foundation of America R: A Language for Data Analysis and Graphics R: A Language for Data Anal- ysis and Graphics. Source Journal of Computational and 5 Conclusion Graphical Statistics, 5(3):299–314, 1996. We presented a methodology to extract aspects of children PA [Janssen and LeBlanc, 2010] Ian Janssen and Allana G behaviour and how these changed before and after an inter- LeBlanc. Systematic review of the health benefits of phys- vention. First we calculated from accelerometers the SVMgs, ical activity and fitness in school-aged children and youth. then later use them to calculate the PA intensities bouts length International Journal of Behavioral Nutrition and Physi- and frequency, who were later used as features to cluster their cal Activity, 7(1):40, 2010. behaviour and monitor changes before and after the interven- tion. [Kelly et al., 2007] Louise A. Kelly, John J. Reilly, Di- This methodology helps understand the impact of the in- ane M. Jackson, Colette Montgomery, Stanley Grant, and tervention from a general and individual level. Whilst we James Y. Paton. Tracking physical activity and sedentary focused here on MVPA and SED intensity levels, a similar behavior in young children. Pediatric exercise science, approach can be used to also include sleep for instance. The 19(1):51–60, 2 2007. advantage of this methodology is that it provides an aggre- [Krebs et al., 2010] Paul Krebs, James O. Prochaska, and gated analysis (via the clusters), but capturing important and Joseph S. Rossi. Defining what Works in Tailoring: A essential aspects of the activity (the length and frequency of Meta-Analysis OF Computer Tailored Interventions for bouts). Health Behavior Change. Prev Med, 51(3-4):214–221, With our small sample data, clusters revealed six groups. 2010. The first three (C1, C2 and C3) where under the daily recom- [Macqueen, 1967] J.B. Macqueen. Some methods for clas- mendations and the other three (C4, C5 and C6) were above sification and analysis of multivariate observations. Pro- these, but each had different characteristics with regards to ceedings of the Fifth Berkeley Symposium on Mathemati- the occurrence of the MVPA and sedentary times. Cluster cal Statistics and Probability, 1(233):281–297, 11 1967. movement analysis enables to see students behaviour change in different ways. [Martinez-Maldonado et al., 2017] Roberto Martinez- Future work will include applying this methodology to Maldonado, Kalina Yacef, Augusto Dias Pereira Dos larger datasets, exploring varying some of the thresholds used Santos, Simon Buckingham Shum, Vanessa Echeverria, and combine it with more refined sequential pattern analysis. Olga C. Santos, and Mykola Pechenizkiy. Towards Proximity Tracking and Sensemaking for Support- ing Teamwork and Learning. In Proceedings - IEEE 17th International Conference on Advanced Learning Technologies, ICALT 2017, pages 89–91. IEEE, 2017. [Ng et al., 2014] Marie Ng, Tom Fleming, Margaret Robin- son, Blake Thomson, Nicholas Graetz, Christopher Mar- gono, ..., and Emmanuela Gakidou. Global, regional, and national prevalence of overweight and obesity in children and adults during 1980-2013: A systematic analysis for the Global Burden of Disease Study 2013. The Lancet, 384(9945):766–781, 2014. [Ochoa, 2017] Xavier Ochoa. Multimodal Learning Ana- lytics. Handbook of Learning Analytics, pages 129–141, 2017. [Phillips et al., 2013] Lisa R S Phillips, Gaynor Parfitt, and Alex V. Rowlands. Calibration of the GENEA accelerome- ter for assessment of physical activity intensity in children. Journal of Science and Medicine in Sport, 16(2):124–128, 2013. [Plasqui et al., 2013] G. Plasqui, A. G. Bonomi, and K. R. Westerterp. Daily physical activity assessment with ac- celerometers: New insights and validation studies. Obesity Reviews, 14(6):451–462, 2013. [Ravi et al., 2005] Nishkam Ravi, Nikhil Dandekar, Preetham Mysore, and Ml Michael L Littman. Activity Recognition from Accelerometer Data. In Proceedings of the Seventeenth Conference on Innovative Applications of Artificial Intelligence(IAAI), volume 5518 LNCS, pages 1541–1546. 2005. [Schaefer et al., 2014] Christine A. Schaefer, Claudio R. Nigg, James O. Hill, Lois A. Brink, and Raymond C. Browning. Establishing and evaluating wrist cutpoints for the GENEActiv accelerometer in youth. Medicine and Sci- ence in Sports and Exercise, 46(4):826–833, 4 2014. [Siemens, 2013] George Siemens. Learning Analytics: The Emergence of a Discipline. American Behavioral Scien- tist, 57(10):1380–1400, 2013. [Sprint et al., 2016] Gina Sprint, Diane J. Cook, and Mau- reen Schmitter-Edgecombe. Unsupervised detection and analysis of changes in everyday physical activity data. Journal of Biomedical Informatics, 63:54–65, 2016. [Worsley, 2014] Marcelo Worsley. Multimodal learning an- alytics as a tool for bridging learning theory and complex learning behaviors. 3rd Multimodal Learning Analytics Workshop and Grand Challenges, MLA 2014, pages 1–4, 2014. [Yacef et al., 2018] Kalina Yacef, Corinne Caillaud, and Olivier Galy. Supporting Learning Activities with Wear- able Devices to Develop Life-Long Skills in a Health Ed- ucation App. In Artificial Intelligence in Education Con- ference, 2018.