<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Evaluation and Deployment of Models for Activity Recognition</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Rita Pucci</string-name>
          <email>pucci@di.unipi.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Advisors: Alessio Micheli and Stefano Chessa at University of Pisa, Department of Computer Science</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Pisa, Department of Computer Science</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>There is a growing need to monitor humans and animals in order to observe their behaviour. To understand animals in their natural environment, and to monitor children and the elderly, scientists rely on remote assessments. Automatic monitoring systems are a possibility to obtain direct observation of a subject. An automatic monitoring system can support biological and medical studies providing identi cation of the activity, without the intrusiveness of a human observer. AR is an emerging eld but will soon be providing innovative solutions to many problems. In literature many models have been presented as a core part of an automatic monitoring system to recognise the activity of a subject. During my PhD, Machine Learning models were developed to detect physical activities of subjects using Activity Recognition (AR) techniques. Machine Learning models were programmed to autonomously identify activity patterns in accelerometer data. The versatility of Machine Learning models make them useful when managing monitoring activity where direct observation would not otherwise be possible.</p>
      </abstract>
      <kwd-group>
        <kwd>Arti cial Neural Network</kwd>
        <kwd>Machine Learning algorithm</kwd>
        <kwd>Human activity recognition</kwd>
        <kwd>Biologging</kwd>
        <kwd>Sensors</kwd>
        <kwd>Accelerometers</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>AR deals with a broad range of applications in many areas connected to
computer science. I focused on three areas where AR is requested to supply the need
of monitoring. Ambient Assisted Living (AAL), HealthCare (HC), and
Biologging have recently increased their interest in automatic monitoring systems. I
started my PhD by familiarising myself with the eld of AR. I then developed
and applied di erent Machine Learning (ML) models to di erent case studies.
This provided a comparison between models over the same datasets and vice
versa. Using knowledge gained in the research phase, I choose libraries and tools
useful in developing and analysing AR systems: Theano (Python library), Shark
(C++ library), and NNTool (MatLab library). These libraries were used to
verify results obtained over the same dataset and the same model. My research
then moved towards managing the lack of common formats in raw accelerometer
datasets. This has prevented comparison among ML models. The evaluation of
models over uniform datasets highlight the analysis of aspects such as accuracy,
latency and processing resources required. The results obtained over the same
dataset allow to analyse the trade-o between performance and obtrusiveness.
This trade-o plays a crucial role in AR systems shift the focus of the research
from the performance to the applicability of the system. In particular, I
investigated and developed models providing a functional application for animal AR
with the project Tortoise@ and compared the results of di erent models to
objective of the Tortoise@ project. Lastly, I used the datasets to obtain a comparison
among models and cases of studies.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Background</title>
      <p>
        Literature shows that there have been several models designed to recognise
activity in raw sensor data. The development of microelectronics described in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]
(hereafter a device) and computer systems made it possible for sensors and
mobile devices to interact with people in their daily activities. Due to this, AR
became a more attainable way of addressing problems in these elds. Research
in AAL is grounded in Ambient Intelligence. Ambient Intelligence technologies
allow people to overcome physical limits, and monitor daily personal and
selfcare activities ([
        <xref ref-type="bibr" rid="ref3">3</xref>
        ],[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]). This technology aims to improve a persons capabilities
using a digital environment that is sensitive, adaptive, and responsive. A
similar development was evident in the HC area where the research was increased
and pushed by the still continuous request from the medical eld. In fact the
HC is a particular medical area of AAL. The human AR for the health care of
elderly is a very active research area, which is recently proposed researches to
prevent physical and psychological deterioration of people. A prior lack of
research in Biologging left a vacuum lled by AR projects addressing biodiversity
and conservation [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. These Biologging projects provide autonomous
systems for monitoring animals in the wild, and in particular, endangered species.
The versatility of AR is growing but we currently lack a unifying model. Authors
compare presented works in surveys and provide a uniform qualitative
comparison. Unfortunately, the comparison of AR approaches is hindered by the fact
that each model uses a di erent data-set. The specialisation of software for a
speci c dataset lends no development to the versatility of AR methods.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Methodology</title>
      <p>
        In this context the AR problem, as described in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], is de ned as a temporal
partition in intervals of time-sequence, hereafter called pattern, which are labeled
concerning the activity performed. The patterns must be consecutive and
nonempty assuming that they have xed length. This de nes the AR problem as
to nd a mapping function that can be evaluated for each pattern to obtain a
classi cation as similar as possible to the actual activity performed. To deal with
AR problem, I used automatic classi er structure. It consists in a lter stage and
a classi er stage. The lter stage pre-processes data to increase the classi cation
performance. Even if the lter stage is a thorny stage, the classi er stage is the
main and most important part of the system. The classi er stage is developed
by a ML model that classi es each time sequence with a label activity.
The set of ML models presented in literature consists mainly in IDNN, SVM,
ESN, and, in recent time, CNN. The IDNN is a subclass of Time Delay
Neural Network introduced for speech recognition and speci cally designed to treat
sequential data. The inputs consist of the outputs of earlier nodes (as in
multilayer perceptron) but not only of the current time slot, but also a number of
previous time slots (it is called time delay). The basic units of IDNN have a
delay introduced on the inputs which allows the model to relate the current input
to the past history of events. The IDNN scans the input window over time so
that the units implement the property of translation invariance , [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. This
property is recommended to deal with the dynamic nature of the sensors' sequences,
the activity has to be recognise independently from a precise moment in time.
I evaluate di erent structures of the model and three training algorithms: the
Backpropagation algorithm [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], the Resilient Propagation algorithm (RP) [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ],
and the Levenberg-Marquardt algorithm (LM) [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ].
      </p>
      <p>
        The main idea of the SVM is to construct a hyperplane as the decision
surface in order to have the maximum margin between positive and the negative
patterns. Through the concept of margin maximisation the SVM represents an
approximate implementation of the method of structural risk minimisation [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
Our research emphasises the fact that the SVM learning algorithm is directly
constructed using the kernel, and the inner product kernel (between support
vectors). The kernel function allows the SVM to be applied either to linearly or
nonlinearly separable patterns. Speci cally we consider the Radial Basis
Function (RBF), polynomial, and linear kernels [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>
        The ESN model is investigated by Jaeger [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] and [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. The ESN model is a
Recurrent Neural Network (RNN) and is based on the Reservoir Computing (RC)
paradigm. This paradigm provides the separation between the recurrent
dynamical part, the reservoir, and the non-recurrent output part, the readout. Hence
the ESN approach di ers from the RNN in training phase where all weights
are adapted. The fact of xed recurrent connections among hidden units is the
key feature of ESN, the encoding function which is implemented by the network
is not adaptive. Because there are no cyclic dependencies between the trained
readout connections, training ESN is a simple linear task. The e ectiveness of
Activity Recognition system based on the ESN has been validated on the EVAAL
international competition [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
      </p>
      <p>
        The CNN is a biologically inspired architecture that can learn invariant features.
The model is introduced by LeCun in [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. The basic idea of CNN is to ensure
a degree of shift and distortion invariance. These models bene t from good
performance and low computational and memory requirements. Recently the
CNN is proposed over accelerometer data.
      </p>
      <p>The dataset is split at random in training and test subsets. The training subset
is used to train each model, as well as to select the values for the
hyperparameters, hereafter Model Selection (MS). Each model is also evaluated for the
amount of memory resources needed, hereafter called Model Assessment (MA).
MA focuses the attention over the applicability concerning the trade-o between
performance obtained and the resources required. The automatic classi er
structure it is evaluated both for the performance obtained and for the applicability. It
is expected the identi cation of a trade o among hardware resources required,
classi cation accuracy, and embedding design. The evaluation among settings
for the two stages of the autonomous classi er shows that it can be possible to
choose the setting concerning the interest of the research, higher accuracy or
more generality of the model.</p>
      <p>Concerning the deployment of activity recognition system, Tortoise@ is an
autonomous system to identify the nest digging activity of tortoises using a
device mounted atop the tortoises shell. An accelerometer, as well as temperature
and light sensors, are embedded on a device MME called a MicaZ module.
Accelerometer data was collected from devices of di erent tortoises during their
two-month nesting. The device can discriminate between (nest) digging and
nondigging activity (speci cally walking and eating) over speci c tortoise's
movements by using an automatic system. The automatic system is modularly
structured using an arti cial neural network and an output lter. For the purpose of
experiment and comparison, and with the aim of minimising the computational
cost, the arti cial neural network has been modelled according to three di erent
architectures based on the input delay neural network (IDNN). All of them were
developed in C and they are standard IDNN, IDNN with Local Receptive Fields
(IDNN LRF), and the IDNN with Local Receptive Fields and Weight Sharing
(IDNN LRF WS). The IDNN LRF takes the local receptive elds from CNN.
Each hidden neuron scans the input using their local receptive eld, units in
a layer are connected in order to receive input from a set of units in a small
neighbourhood of the previous layer. The IDNN LRF WS is still inspired by
CNN with weight sharing (WS) among LRF hidden units. These further reduce
the number of free parameters from the large amount of units sharing the same
weight vector and obtains a certain level of shift invariance (the detection of
features regardless of their position).
4</p>
    </sec>
    <sec id="sec-4">
      <title>Discussion</title>
      <p>
        During my Masters degree, and my PhD, I took part in the Tortoise@ project.
Results of the three architecture of the IDNN are initially presented in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and
extended in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. The evaluation of the proposed models, IDNN, IDNN LRF, and
IDNN LRF WS, was performed to nd a good trade-o between the performance
and the applicability on a low power device. The performance measurements of
the three models take into account the averages of errors computed on ve di
erent initializations of their weights for the ANN. The applicability it is evaluated
considering the memory required for the storage of the weights of the IDNN.
The highest performance, concerning the Tortoise@, is reached by the IDNN.
IDNN provides an accuracy of 96.24% with a memory occupation of 1844 bytes.
The memory space required for the IDNN is less than 30% if it is used the IDNN
LRF, 398 bytes with an accuracy of 95.51%. It is worth to bear 1% in accuracy
less for a reduction of 70% of memory space. With the IDNN LRF WS the
advantage is less a ecting, the accuracy obtained is of 94.34% with 196 bytes of
memory space. It is worth noting that to this memory space it is to add the space
required for the lter stage. Tortoise@ is a starting point for many di erent AR
project for animals. In fact, a similar automatic system classi er was developed
to be embedded on a mobile phone as a rudimentary prototype. Tortoise@ is one
of the rst projects about an automatic system classi er for the classi cation
of tortoise movements. The advantage of the ESN, SVM, and CNN is due to
the possibility of analyse long sequences without any pre-processing phase. In
relation to the project to save biodiversity in the last year of my Ph.D., I spent
six months at the University of Queensland in Brisbane, Australia. During this
period I started (ongoing) collaboration with the University of Queensland and
with Macquarie University. We analyse ML models to identify prey capture
activity in Little Penguins and in seals. Results obtained among the architectures
based on the IDNN were compared to results obtained with SVM, ESN, and
CNN over the same dataset. The results are still under analysis and will be
published in a paper in this year. Concerning the Human AR, I participated
in Palumbo et Al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. We compared the results obtained with ESN and IDNN
over a dataset of Received Signal Strength (RSS) and accelerometer data to
recognise seven di erent activities. We selected daily activities: Bending,
Cycling, Lying, Sitting, Standing, and Walking. The performance of the activity
recognition system is assessed on a purpose-speci c collected real-world dataset.
Our results show that the proposed system reaches a very high level of accuracy
while maintaining a low deployment cost. The average test accuracy of 0.944
with ESN is comparable with the performance obtained with IDNN of 0.923.
Single activities are well recognized with accuracy score from 0.825 (Standing)
to 0.999 (Cycling). However, it is worth noting that the Sitting and Standing
activities are hard distinguishable, due to the nature of the RSS input signals
used. Speci cally, if we include the phases of sitting down and standing up the
ESN system obtains good results, as reported in [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ],[
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. For each case of study
it is possible to identify the better compromise between performances and
applicability, the IDNN is a good compromise between performance and applicability,
ESN and CNN are mainly adapt for higher performances and are exible enough
to further customized architecture that observe the applicability limits.
5
      </p>
    </sec>
    <sec id="sec-5">
      <title>Future Work</title>
      <p>The problem of AR with humans and animals is still open and is growing up
looking for new solutions mainly in AAL, and HC. The Biologging area is
emerging and it is even more necessary now that the biodiversity crisis that
characterised last decades leads to the decline and extinction of many animal species
worldwide. Future proposals will continue developing complete systems that has
potential to be applied to many di erent case studies, providing a serious
possibility for monitoring and protection of endangered species and for helping and
support the elderly and children.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Francis</surname>
            ,
            <given-names>L A</given-names>
          </string-name>
          and
          <string-name>
            <surname>Iniewski</surname>
            ,
            <given-names>K</given-names>
          </string-name>
          (
          <year>2013</year>
          )
          <article-title>Novel Advances in Microsystems Technologies and Their Applications RC Press</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Haykin</surname>
            ,
            <given-names>S</given-names>
          </string-name>
          (
          <year>2009</year>
          )
          <article-title>Neural networks and learning machines Pearson Upper Saddle River</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Palumbo</surname>
            ,
            <given-names>F</given-names>
          </string-name>
          and Gallicchio, C and Pucci, R and Micheli,
          <string-name>
            <surname>A</surname>
          </string-name>
          (
          <year>2016</year>
          )
          <article-title>Human activity recognition using multisensor data fusion based Reservoir Computing</article-title>
          .
          <source>IOS Press 8:87107</source>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Barbuti</surname>
            ,
            <given-names>R</given-names>
          </string-name>
          and Chessa, S and Micheli,
          <string-name>
            <surname>A</surname>
          </string-name>
          and Pucci,
          <string-name>
            <surname>R</surname>
          </string-name>
          (
          <year>2013</year>
          )
          <article-title>Identi cation of nesting phase in tortoise populations by neural networks</article-title>
          .
          <source>Extended Abstract The 50th Anniversary Convention of the AISB, selected papers 6265</source>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Barbuti</surname>
            ,
            <given-names>R</given-names>
          </string-name>
          and Chessa, S and Micheli,
          <string-name>
            <surname>A</surname>
          </string-name>
          and Pucci,
          <string-name>
            <surname>R</surname>
          </string-name>
          (
          <year>2016</year>
          )
          <article-title>Localizing Tortoise Nests by Neural Network</article-title>
          . PlosOne
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Lara</surname>
            ,
            <given-names>O D</given-names>
          </string-name>
          and
          <string-name>
            <surname>Labrador</surname>
            ,
            <given-names>M A</given-names>
          </string-name>
          (
          <year>2013</year>
          )
          <article-title>A survey on human activity recognition using wearable sensors</article-title>
          .
          <source>IEEE 15:11921209</source>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>LBocca</surname>
          </string-name>
          , M and
          <string-name>
            <surname>Kaltiokallio</surname>
            ,
            <given-names>O</given-names>
          </string-name>
          and Patwari,
          <string-name>
            <surname>N</surname>
          </string-name>
          (
          <year>2012</year>
          )
          <article-title>Radio tomographic imaging for ambient assisted living</article-title>
          . Springer 108130
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Block</surname>
            ,
            <given-names>B A</given-names>
          </string-name>
          (
          <year>2005</year>
          )
          <article-title>Physiological Ecology in the 21st Century: Advancements in Biologging</article-title>
          .
          <source>Science Integrative and Comparative Biology</source>
          <volume>45</volume>
          :
          <fpage>305</fpage>
          -
          <lpage>320</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Kooyman</surname>
            ,
            <given-names>G L</given-names>
          </string-name>
          (
          <year>2004</year>
          )
          <article-title>Genesis and evolution of biologging devices: l963-</article-title>
          <year>2002</year>
          58:
          <fpage>522</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Vapnik</surname>
          </string-name>
          , V and
          <string-name>
            <surname>Levin</surname>
          </string-name>
          , E and
          <string-name>
            <surname>Le Cun</surname>
            ,
            <given-names>Y</given-names>
          </string-name>
          (
          <year>1994</year>
          )
          <article-title>Measuring the VC-dimension of a learning machine</article-title>
          .
          <source>Neural Computation</source>
          <volume>6</volume>
          :
          <fpage>851876</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Waibel</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          (
          <year>1989</year>
          )
          <article-title>Modular construction of time-delay neural networks for speech recognition</article-title>
          .
          <source>Neural computation 1:3946</source>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Jaeger</surname>
            ,
            <given-names>H</given-names>
          </string-name>
          (
          <year>2002</year>
          )
          <article-title>Adaptive nonlinear system identi cation with echo state networks</article-title>
          .
          <source>In Advances in neural information processing systems 593600</source>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Ozturk</surname>
          </string-name>
          , M and
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>D</given-names>
          </string-name>
          and Pr ncipe,
          <string-name>
            <surname>J C.</surname>
          </string-name>
          (
          <year>2007</year>
          )
          <article-title>Analysis and design of echo state networks</article-title>
          .
          <source>Neural Computation</source>
          ,
          <volume>19</volume>
          :
          <fpage>111138</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <article-title>Bot a, J A and Garc a, J A A and Fujinami, K and Barsocchi</article-title>
          , P and Riedel,
          <string-name>
            <surname>T</surname>
          </string-name>
          (
          <year>2013</year>
          )
          <article-title>Evaluating aal systems through competitive benchmarking</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>LeCun</surname>
            ,
            <given-names>Y</given-names>
            and Bengio, Y
          </string-name>
          (
          <year>1995</year>
          )
          <article-title>Convolutional networks for images, speech, and time series</article-title>
          .
          <source>The handbook of brain theory and neural networks 3361:255258</source>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Alvarez-Garc</surname>
            <given-names>a</given-names>
          </string-name>
          , Juan Antonio and Barsocchi, Paolo and Chessa, Stefano and Salvi,
          <string-name>
            <surname>Dario</surname>
          </string-name>
          (
          <year>2013</year>
          )
          <article-title>Evaluation of localization and activity recognition systems for ambient assisted living: The experience of the 2012 EvAAL competition</article-title>
          .
          <source>Journal of Ambient Intelligence and Smart Environments</source>
          <volume>5</volume>
          :
          <fpage>1</fpage>
          <lpage>119132</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Palumbo</surname>
          </string-name>
          , Filippo and Barsocchi, Paolo and Gallicchio, Claudio and Chessa, Stefano and Micheli,
          <string-name>
            <surname>Alessio</surname>
          </string-name>
          (
          <year>2013</year>
          )
          <article-title>Multisensor data fusion for activity recognition based on reservoir computing</article-title>
          .
          <source>International Competition on Evaluating AAL Systems through Competitive Benchmarking 2435</source>
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>More</surname>
            ,
            <given-names>Jorge J</given-names>
          </string-name>
          (
          <year>1978</year>
          )
          <article-title>The Levenberg-Marquardt algorithm: implementation and theory</article-title>
          .
          <source>Numerical analysis 105116</source>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Riedmiller</surname>
          </string-name>
          ,
          <string-name>
            <surname>Martin</surname>
          </string-name>
          (
          <year>1994</year>
          )
          <article-title>Advanced supervised learning in multi-layer perceptronsfrom backpropagation to adaptive learning algorithms</article-title>
          .
          <source>Computer Standards &amp; Interfaces</source>
          <volume>16</volume>
          :3:
          <fpage>265278</fpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>