<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Activity Pro ling and Phenotypes of Physical Activity and Sleep</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Deepika Verma</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science Norwegian University of Science and Technology</institution>
          ,
          <addr-line>Trondheim</addr-line>
          ,
          <country country="NO">Norway</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Physical inactivity and obesity contribute signi cantly to several health problems and have been a topic of research amongst the public health research community. Objective measurements of physical behaviour using body-worn sensors have radically uplifted the method of collecting physical behaviour data from a large population and opened the possibility of analysing these objective recordings using machine learning methods to gain useful insights such as identifying di erent physical activities and the duration of time spent in each activity, among others. We aim to address the theme of physical activity from both the public health and computer science perspective by rst determining the best strategy to increase the accuracy of human activity classi cation from body-worn sensor data collected in cohort studies using the same data collection protocol as HUNT41 and strategically using objectively measured physical activity data to identify clusters of di erent physical behaviour phenotypes such that intra-cluster similarity is maximized. Identifying physical behaviour phenotypes can allow us in the future to design tailored interventions for users to self-manage their daily physical activity routines. Additionally, we consider the compositional nature of the physical behaviour data we have and plan to employ suitable methodologies which allow us to model the compositional co-dependency between physical behaviours and sleep.</p>
      </abstract>
      <kwd-group>
        <kwd>Case-Based Reasoning</kwd>
        <kwd>Machine Learning</kwd>
        <kwd>Activity Phenotypes</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Maintaining an active lifestyle has been shown to be essential in leading a healthy
life and has several positive e ects such as improved mental health [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], sleep
quality [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and reduced risk of mortality [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Physical inactivity has been associated
with several health problems in addition to negative impacts on mental health
and mortality. Objective measurements of physical behaviour have opened new
possibilities of research for public health and computer science researchers alike.
Furthermore, it facilitates providing activity recommendations to the users of
smart-activity trackers (such as FitBit) by keeping track of their total amount
of physical activity. However, the activity recommendations remain nearly
unchanged throughout the user base and may or may not be challenging enough for
certain users since they are not tailored to suit each user's daily physical activity
routine. To provide more tailored and personalized activity recommendations,
accurate identi cation of di erent physical behaviours from body-worn sensors
is a prerequisite.
      </p>
      <p>
        In addition to regular physical activity, sleep duration and sleep quality are
important but highly volatile factors for sound health and have a strong
compositional co-dependency with daily physical behaviour. Although there is no
doubt that physical behaviour and sleep have a profound impact on health, it
has recently become evident that our current understanding of the health e ect
of these entities is awed by serious methodological shortcomings. In particular,
this pertains to the use of self-reported data, which is prone to bias and
misclassi cation [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], and more importantly, the failure to recognise the compositional
nature of various physical behaviours and sleep [
        <xref ref-type="bibr" rid="ref2 ref3">2,3</xref>
        ]. The latter refers to the
fact that the duration of various physical behaviours and sleep are inherently
co-dependent, that is, increasing the duration of one behaviour (for example,
sitting) will necessarily reduce the duration of at least one other behaviour since
the total time in any given day is xed at 24 hours. Therefore, it is imperative to
determine whether a change in one type of behaviour leading to compensatory
shifts in other behaviours is bene cial or harmful for health.
      </p>
      <p>This research builds on the requirement to generate classi cation models which
are general enough to provide accurate classi cation of distinct physical
behaviours from sensor data collected in cohort studies with the same data
collection protocol as HUNT4 and furthermore, to e ectively utilise the objective data
to identify di erent physical behaviour phenotypes. HUNT4 is the fourth round
of the HUNT2 cohort study which collects large amount of both subjective as well
as objective health data and objective physical activity data collected through
body-worn accelerometers (which forms the target dataset in this research).
2</p>
    </sec>
    <sec id="sec-2">
      <title>Research Aim</title>
      <p>The overall goal is to achieve a high classi cation accuracy in classifying basic
human activities from sensor data streams collected during HUNT4 and utilise
this information to correctly estimate the time spent by an individual in each of
the six di erent lower level physical activities: lying, sitting, standing, walking,
running, cycling. Furthermore, these objective duration of physical activities of
the participants are intended to be utilised to identify phenotypes of physical
activities, which is important in order to design personalised and sustainable
digital interventions for users for self-managing daily physical activity routines.
2 https://www.ntnu.no/hunt/
2.1</p>
      <p>Foreseen challenges
{ How to improve performance of machine learning classi ers for identifying
human behaviour accurately using sensor data collected during HUNT4?
{ How to identify sleep duration from multiple sensor data streams using
machine learning methods?
{ How to analyse the quality of clusters (physical behaviour phenotypes)
obtained using di erent clustering methods?
{ How to design tailored interventions of daily physical activity routines for
users using existing physical behaviour pro les?</p>
      <p>To address the rst challenge, we will use di erent training datasets,
including one wherein the data has been collected in out-of-lab settings and evaluate
their performance by comparing with other model(s) trained using dataset(s)
collected in in-lab settings. To address the second challenge, we will implement
methodologies to identify sleep patterns and duration from Polysomnographic
(sleep) recordings. Finally we plan to implement a similarity-based method for
clustering the physical activity pro les of the HUNT4 participants. To analyse
the quality of clusters obtained, we plan to implement state-of-the-art
clustering methods and compare the performance of the similarity-based method with
them. Furthermore, we plan to utilise clustering evaluation measures for
comparing the cluster quality of the implemented methods to generate a quality
assessment. The resulting clusters are expected to localise our search for the
physical behaviour phenotypes in our dataset.
2.2</p>
      <sec id="sec-2-1">
        <title>Expected Outcomes</title>
        <p>Overall, the main aim of this thesis is the exploration of machine learning
methods for the HUNT4 physical behaviour raw data recordings. We expect to build
a toolbox that allows:
{ Analysis of the body-worn accelerometer data using machine learning
methods.
{ Analysis of the high-resolution physical behaviour data using CBR and
Compositional data analysis.
3
3.1</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Proposed Research Plan</title>
      <sec id="sec-3-1">
        <title>Literature Review</title>
        <p>A thorough review of the existing state-of-the-art and literature of human
activity recognition systems forms the basis of this research. Identifying gaps in the
existing work enables us to make improvements on previously untouched topics.
Additionally, an overview of the literature and methodologies for Compositional
Data Analysis is required for understanding the co-dependency between various
physical behaviours and their in uence on one another since the data we are
using is compositional in nature.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Machine Learning Approach</title>
        <p>
          Several state-of-the-art methods such as Random Forest, k-NN, SVM and
Hidden Markov Models among others have been successfully implemented for
human activity recognition systems and have been shown to be quite e ective in
identifying di erent physical behaviour. Neural Network implementations such
as Recurrent Neural Networks [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], Long Short-Term Memory [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], Convolutional
Neural Networks among others have also shown promising results with an added
advantage of not requiring the feature engineering step before feeding the data
into the algorithms. We plan to implement a combination of these machine
learning methods using both in-lab and out-of-lab training datasets and generate a
comparative assessment with respect to the data collection protocol as well as
the subject-dependency of the methods in order to determine the method(s) best
suited to the application.
3.3
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>Public Health Insights with CBR and Compositional Data analysis</title>
        <p>For gaining insights from the public health dataset we have, we plan to utilise
di erent methodologies including CBR and Compositional Data analysis. CBR
has been used in our work for developing the similarity measures for assessing the
similarity amongst and nding similar physical behaviour pro les of the
participants of HUNT4. Furthermore, we will utilize the knowledge intensive similarity
used in CBR to generate clusters of semantically similar behaviour pro les in
order to localise our search for the physical behaviour phenotypes. Additionally,
we will employ methodologies for compositional data analysis to estimate the
e ect of increase or decrease in sleep and/or sedentary time on the composition
of daily activities since an increase in the duration of one behaviour, say sitting,
will necessarily reduce duration of at least one other behaviour because the total
time in any given day is xed at 24 hours. Thus, whether a change in one type of
behaviour is bene cial or harmful for health depends on the compensatory shifts
in other behaviours. This data analysis methodology will be studied extensively
along with various other related research articles to get a better understanding
of it and how it can be applied to gain insights into the public health data and
thereby utilize it to design tailored and sustainable digital interventions for the
users to self-manage their daily physical activity routines .
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Current Progress and Future Work</title>
      <p>
        We have presented our methodology for developing the similarity measures in
myCBR for assessing the similarity amongst the physical behaviour pro les at
ICCBR 2018 [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] and NAIS 2019 [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Additionally, we have now implemented
a knowledge intensive similarity -based method for clustering the physical
behaviour pro les in order to localise the most similar pro les into a single cluster.
We plan to perform bout analysis on ne-grained sequential physical activity
data of the participants to generate their unique pro les and subsequently
apply similarity-based clustering to obtain more re ned clusters. We expect these
clusters to give us a representation of the physical behaviour phenotypes in our
dataset. On the human activity classi cation side, we plan to implement some
state-of-the-art machine learning algorithms using both in-lab and out-of-lab
datasets and generate a comparative assessment in order to determine the best
suited method(s) for our dataset.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Arem</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moore</surname>
            ,
            <given-names>S.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Patel</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hartge</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , Berrington de Gonzalez,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Visvanathan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            ,
            <surname>Campbell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.T.</given-names>
            ,
            <surname>Freedman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Weiderpass</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>Adami</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.O.</given-names>
            ,
            <surname>Linet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.S.</given-names>
            ,
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.M.</given-names>
            ,
            <surname>Matthews</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.E.</surname>
          </string-name>
          :
          <article-title>Leisure Time Physical Activity and Mortality: A Detailed Pooled Analysis of the Dose-Response Relationship</article-title>
          .
          <source>JAMA Internal Medicine</source>
          <volume>175</volume>
          (
          <issue>6</issue>
          ),
          <volume>959</volume>
          {
          <volume>967</volume>
          (06
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Dumuid</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pedisic</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stanford</surname>
            ,
            <given-names>T.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Martin-Fernandez</surname>
            ,
            <given-names>J.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hron</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maher</surname>
            ,
            <given-names>C.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lewis</surname>
            ,
            <given-names>L.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Olds</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>The compositional isotemporal substitution model: A method for estimating changes in a health outcome for reallocation of time between sleep, physical activity and sedentary behaviour</article-title>
          .
          <source>Statistical Methods in Medical Research</source>
          <volume>28</volume>
          (
          <issue>3</issue>
          ),
          <volume>846</volume>
          {
          <fpage>857</fpage>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Fairclough</surname>
            ,
            <given-names>S.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dumuid</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Taylor</surname>
          </string-name>
          , S.,
          <string-name>
            <surname>Curry</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McGrane</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stratton</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maher</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Olds</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Fitness, fatness and the reallocation of time between children's daily movement behaviours: an analysis of compositional data</article-title>
          .
          <source>International Journal of Behavioral Nutrition and Physical Activity</source>
          <volume>14</volume>
          (
          <issue>1</issue>
          ) (may
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Harris</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          :
          <article-title>The relationship between physical inactivity and mental wellbeing: Findings from a gami cation-based community-wide physical activity intervention</article-title>
          .
          <source>Health Psychology Open</source>
          <volume>5</volume>
          (
          <issue>1</issue>
          ),
          <volume>2055102917753853</volume>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Hochreiter</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schmidhuber</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>Long short-term memory</article-title>
          .
          <source>Neural Computation</source>
          <volume>9</volume>
          (
          <issue>8</issue>
          ),
          <volume>1735</volume>
          {
          <fpage>1780</fpage>
          (
          <year>1997</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Kredlow</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Capozzoli</surname>
            ,
            <given-names>M.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hearon</surname>
            ,
            <given-names>B.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Calkins</surname>
            ,
            <given-names>A.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Otto</surname>
            ,
            <given-names>M.W.:</given-names>
          </string-name>
          <article-title>The e ects of physical activity on sleep: a meta-analytic review</article-title>
          .
          <source>Journal of behavioral medicine 38(3)</source>
          ,
          <volume>427</volume>
          {
          <fpage>449</fpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Prince</surname>
            ,
            <given-names>S.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Adamo</surname>
            ,
            <given-names>K.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hamel</surname>
            ,
            <given-names>M.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hardt</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gorber</surname>
            ,
            <given-names>S.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tremblay</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>A comparison of direct versus self-report measures for assessing physical activity in adults: a systematic review</article-title>
          .
          <source>International Journal of Behavioral Nutrition and Physical Activity</source>
          <volume>5</volume>
          (
          <issue>1</issue>
          ),
          <volume>56</volume>
          (Nov
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Twomey</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Diethe</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fafoutis</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Elsts</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McConville</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Flach</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Craddock</surname>
            ,
            <given-names>I.:</given-names>
          </string-name>
          <article-title>A comprehensive study of activity recognition using accelerometers</article-title>
          .
          <source>Informatics</source>
          <volume>5</volume>
          (
          <issue>2</issue>
          ) (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Verma</surname>
            , Deepika, Bach, Kerstin, Jarle Mork,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Similarity measure development for case-based reasoning- a data-driven approach</article-title>
          . arXiv.org (May
          <year>2019</year>
          ), https: //arxiv.org/abs/
          <year>1905</year>
          .08581
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Verma</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bach</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mork</surname>
            ,
            <given-names>P.J.:</given-names>
          </string-name>
          <article-title>Modelling similarity for comparing physical activity pro les - a data-driven approach</article-title>
          . In: Cox,
          <string-name>
            <given-names>M.T.</given-names>
            ,
            <surname>Funk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Begum</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.</surname>
          </string-name>
          <source>(eds.) Case-Based Reasoning Research and Development</source>
          . pp.
          <volume>415</volume>
          {
          <fpage>430</fpage>
          . Springer International Publishing,
          <string-name>
            <surname>Cham</surname>
          </string-name>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>