Evaluation and Deployment of Models for
                  Activity Recognition

                                         Rita Pucci

         University of Pisa, Department of Computer Science, pucci@di.unipi.it

Advisors: Alessio Micheli and Stefano Chessa at University of Pisa, Department of Computer Sci-
ence.


         Abstract. There is a growing need to monitor humans and animals in
         order to observe their behaviour. To understand animals in their natural
         environment, and to monitor children and the elderly, scientists rely on
         remote assessments. Automatic monitoring systems are a possibility to
         obtain direct observation of a subject. An automatic monitoring system
         can support biological and medical studies providing identification of the
         activity, without the intrusiveness of a human observer. AR is an emerg-
         ing field but will soon be providing innovative solutions to many prob-
         lems. In literature many models have been presented as a core part of an
         automatic monitoring system to recognise the activity of a subject. Dur-
         ing my PhD, Machine Learning models were developed to detect physical
         activities of subjects using Activity Recognition (AR) techniques. Ma-
         chine Learning models were programmed to autonomously identify activ-
         ity patterns in accelerometer data. The versatility of Machine Learning
         models make them useful when managing monitoring activity where di-
         rect observation would not otherwise be possible.


Keywords: Artificial Neural Network, Machine Learning algorithm, Human
activity recognition, Biologging, Sensors, Accelerometers


1       Introduction

AR deals with a broad range of applications in many areas connected to com-
puter science. I focused on three areas where AR is requested to supply the need
of monitoring. Ambient Assisted Living (AAL), HealthCare (HC), and Biolog-
ging have recently increased their interest in automatic monitoring systems. I
started my PhD by familiarising myself with the field of AR. I then developed
and applied different Machine Learning (ML) models to different case studies.
This provided a comparison between models over the same datasets and vice
versa. Using knowledge gained in the research phase, I choose libraries and tools
useful in developing and analysing AR systems: Theano (Python library), Shark
(C++ library), and NNTool (MatLab library). These libraries were used to ver-
ify results obtained over the same dataset and the same model. My research
then moved towards managing the lack of common formats in raw accelerometer
datasets. This has prevented comparison among ML models. The evaluation of
models over uniform datasets highlight the analysis of aspects such as accuracy,
latency and processing resources required. The results obtained over the same
dataset allow to analyse the trade-off between performance and obtrusiveness.
This trade-off plays a crucial role in AR systems shift the focus of the research
from the performance to the applicability of the system. In particular, I inves-
tigated and developed models providing a functional application for animal AR
with the project Tortoise@ and compared the results of different models to objec-
tive of the Tortoise@ project. Lastly, I used the datasets to obtain a comparison
among models and cases of studies.


2   Background
Literature shows that there have been several models designed to recognise ac-
tivity in raw sensor data. The development of microelectronics described in [1]
(hereafter a device) and computer systems made it possible for sensors and mo-
bile devices to interact with people in their daily activities. Due to this, AR
became a more attainable way of addressing problems in these fields. Research
in AAL is grounded in Ambient Intelligence. Ambient Intelligence technologies
allow people to overcome physical limits, and monitor daily personal and self-
care activities ([3],[7]). This technology aims to improve a persons capabilities
using a digital environment that is sensitive, adaptive, and responsive. A simi-
lar development was evident in the HC area where the research was increased
and pushed by the still continuous request from the medical field. In fact the
HC is a particular medical area of AAL. The human AR for the health care of
elderly is a very active research area, which is recently proposed researches to
prevent physical and psychological deterioration of people. A prior lack of re-
search in Biologging left a vacuum filled by AR projects addressing biodiversity
and conservation [8] − [9]. These Biologging projects provide autonomous sys-
tems for monitoring animals in the wild, and in particular, endangered species.
The versatility of AR is growing but we currently lack a unifying model. Authors
compare presented works in surveys and provide a uniform qualitative compar-
ison. Unfortunately, the comparison of AR approaches is hindered by the fact
that each model uses a different data-set. The specialisation of software for a
specific dataset lends no development to the versatility of AR methods.


3   Methodology
In this context the AR problem, as described in [6], is defined as a temporal
partition in intervals of time-sequence, hereafter called pattern, which are labeled
concerning the activity performed. The patterns must be consecutive and non-
empty assuming that they have fixed length. This defines the AR problem as
to find a mapping function that can be evaluated for each pattern to obtain a
classification as similar as possible to the actual activity performed. To deal with
AR problem, I used automatic classifier structure. It consists in a filter stage and
a classifier stage. The filter stage pre-processes data to increase the classification
performance. Even if the filter stage is a thorny stage, the classifier stage is the
main and most important part of the system. The classifier stage is developed
by a ML model that classifies each time sequence with a label activity.
The set of ML models presented in literature consists mainly in IDNN, SVM,
ESN, and, in recent time, CNN. The IDNN is a subclass of Time Delay Neu-
ral Network introduced for speech recognition and specifically designed to treat
sequential data. The inputs consist of the outputs of earlier nodes (as in mul-
tilayer perceptron) but not only of the current time slot, but also a number of
previous time slots (it is called time delay). The basic units of IDNN have a de-
lay introduced on the inputs which allows the model to relate the current input
to the past history of events. The IDNN scans the input window over time so
that the units implement the property of translation invariance , [11]. This prop-
erty is recommended to deal with the dynamic nature of the sensors’ sequences,
the activity has to be recognise independently from a precise moment in time.
I evaluate different structures of the model and three training algorithms: the
Backpropagation algorithm [2], the Resilient Propagation algorithm (RP) [19],
and the Levenberg-Marquardt algorithm (LM) [18].
The main idea of the SVM is to construct a hyperplane as the decision sur-
face in order to have the maximum margin between positive and the negative
patterns. Through the concept of margin maximisation the SVM represents an
approximate implementation of the method of structural risk minimisation [10].
Our research emphasises the fact that the SVM learning algorithm is directly
constructed using the kernel, and the inner product kernel (between support
vectors). The kernel function allows the SVM to be applied either to linearly or
nonlinearly separable patterns. Specifically we consider the Radial Basis Func-
tion (RBF), polynomial, and linear kernels [2].
The ESN model is investigated by Jaeger [12] and [13]. The ESN model is a Re-
current Neural Network (RNN) and is based on the Reservoir Computing (RC)
paradigm. This paradigm provides the separation between the recurrent dynam-
ical part, the reservoir, and the non-recurrent output part, the readout. Hence
the ESN approach differs from the RNN in training phase where all weights
are adapted. The fact of fixed recurrent connections among hidden units is the
key feature of ESN, the encoding function which is implemented by the network
is not adaptive. Because there are no cyclic dependencies between the trained
readout connections, training ESN is a simple linear task. The effectiveness of
Activity Recognition system based on the ESN has been validated on the EVAAL
international competition [14].
The CNN is a biologically inspired architecture that can learn invariant features.
The model is introduced by LeCun in [15]. The basic idea of CNN is to ensure
a degree of shift and distortion invariance. These models benefit from good
performance and low computational and memory requirements. Recently the
CNN is proposed over accelerometer data.
The dataset is split at random in training and test subsets. The training subset
is used to train each model, as well as to select the values for the hyperpa-
rameters, hereafter Model Selection (MS). Each model is also evaluated for the
amount of memory resources needed, hereafter called Model Assessment (MA).
MA focuses the attention over the applicability concerning the trade-off between
performance obtained and the resources required. The automatic classifier struc-
ture it is evaluated both for the performance obtained and for the applicability. It
is expected the identification of a trade off among hardware resources required,
classification accuracy, and embedding design. The evaluation among settings
for the two stages of the autonomous classifier shows that it can be possible to
choose the setting concerning the interest of the research, higher accuracy or
more generality of the model.
Concerning the deployment of activity recognition system, Tortoise@ is an au-
tonomous system to identify the nest digging activity of tortoises using a de-
vice mounted atop the tortoises shell. An accelerometer, as well as temperature
and light sensors, are embedded on a device MME called a MicaZ module.
Accelerometer data was collected from devices of different tortoises during their
two-month nesting. The device can discriminate between (nest) digging and non-
digging activity (specifically walking and eating) over specific tortoise’s move-
ments by using an automatic system. The automatic system is modularly struc-
tured using an artificial neural network and an output filter. For the purpose of
experiment and comparison, and with the aim of minimising the computational
cost, the artificial neural network has been modelled according to three different
architectures based on the input delay neural network (IDNN). All of them were
developed in C and they are standard IDNN, IDNN with Local Receptive Fields
(IDNN LRF), and the IDNN with Local Receptive Fields and Weight Sharing
(IDNN LRF WS). The IDNN LRF takes the local receptive fields from CNN.
Each hidden neuron scans the input using their local receptive field, units in
a layer are connected in order to receive input from a set of units in a small
neighbourhood of the previous layer. The IDNN LRF WS is still inspired by
CNN with weight sharing (WS) among LRF hidden units. These further reduce
the number of free parameters from the large amount of units sharing the same
weight vector and obtains a certain level of shift invariance (the detection of
features regardless of their position).


4   Discussion

During my Masters degree, and my PhD, I took part in the Tortoise@ project.
Results of the three architecture of the IDNN are initially presented in [4] and
extended in [5]. The evaluation of the proposed models, IDNN, IDNN LRF, and
IDNN LRF WS, was performed to find a good trade-off between the performance
and the applicability on a low power device. The performance measurements of
the three models take into account the averages of errors computed on five differ-
ent initializations of their weights for the ANN. The applicability it is evaluated
considering the memory required for the storage of the weights of the IDNN.
The highest performance, concerning the Tortoise@, is reached by the IDNN.
IDNN provides an accuracy of 96.24% with a memory occupation of 1844 bytes.
The memory space required for the IDNN is less than 30% if it is used the IDNN
LRF, 398 bytes with an accuracy of 95.51%. It is worth to bear 1% in accuracy
less for a reduction of 70% of memory space. With the IDNN LRF WS the ad-
vantage is less affecting, the accuracy obtained is of 94.34% with 196 bytes of
memory space. It is worth noting that to this memory space it is to add the space
required for the filter stage. Tortoise@ is a starting point for many different AR
project for animals. In fact, a similar automatic system classifier was developed
to be embedded on a mobile phone as a rudimentary prototype. Tortoise@ is one
of the first projects about an automatic system classifier for the classification
of tortoise movements. The advantage of the ESN, SVM, and CNN is due to
the possibility of analyse long sequences without any pre-processing phase. In
relation to the project to save biodiversity in the last year of my Ph.D., I spent
six months at the University of Queensland in Brisbane, Australia. During this
period I started (ongoing) collaboration with the University of Queensland and
with Macquarie University. We analyse ML models to identify prey capture ac-
tivity in Little Penguins and in seals. Results obtained among the architectures
based on the IDNN were compared to results obtained with SVM, ESN, and
CNN over the same dataset. The results are still under analysis and will be
published in a paper in this year. Concerning the Human AR, I participated
in Palumbo et Al. [3]. We compared the results obtained with ESN and IDNN
over a dataset of Received Signal Strength (RSS) and accelerometer data to
recognise seven different activities. We selected daily activities: Bending, Cy-
cling, Lying, Sitting, Standing, and Walking. The performance of the activity
recognition system is assessed on a purpose-specific collected real-world dataset.
Our results show that the proposed system reaches a very high level of accuracy
while maintaining a low deployment cost. The average test accuracy of 0.944
with ESN is comparable with the performance obtained with IDNN of 0.923.
Single activities are well recognized with accuracy score from 0.825 (Standing)
to 0.999 (Cycling). However, it is worth noting that the Sitting and Standing
activities are hard distinguishable, due to the nature of the RSS input signals
used. Specifically, if we include the phases of sitting down and standing up the
ESN system obtains good results, as reported in [16],[17]. For each case of study
it is possible to identify the better compromise between performances and appli-
cability, the IDNN is a good compromise between performance and applicability,
ESN and CNN are mainly adapt for higher performances and are flexible enough
to further customized architecture that observe the applicability limits.


5   Future Work

The problem of AR with humans and animals is still open and is growing up
looking for new solutions mainly in AAL, and HC. The Biologging area is emerg-
ing and it is even more necessary now that the biodiversity crisis that charac-
terised last decades leads to the decline and extinction of many animal species
worldwide. Future proposals will continue developing complete systems that has
potential to be applied to many different case studies, providing a serious possi-
bility for monitoring and protection of endangered species and for helping and
support the elderly and children.

References
 1. Francis, L A and Iniewski, K (2013) Novel Advances in Microsystems Technologies
    and Their Applications RC Press
 2. Haykin, S (2009) Neural networks and learning machines Pearson Upper Saddle
    River
 3. Palumbo, F and Gallicchio, C and Pucci, R and Micheli, A (2016) Human activity
    recognition using multisensor data fusion based Reservoir Computing. IOS Press
    8:87107
 4. Barbuti, R and Chessa, S and Micheli, A and Pucci, R (2013) Identification of
    nesting phase in tortoise populations by neural networks. Extended Abstract The
    50th Anniversary Convention of the AISB, selected papers 6265
 5. Barbuti, R and Chessa, S and Micheli, A and Pucci, R (2016) Localizing Tortoise
    Nests by Neural Network. PlosOne
 6. Lara, O D and Labrador, M A (2013) A survey on human activity recognition
    using wearable sensors. IEEE 15:11921209
 7. LBocca, M and Kaltiokallio, O and Patwari, N (2012) Radio tomographic imaging
    for ambient assisted living. Springer 108130
 8. Block, B A (2005) Physiological Ecology in the 21st Century: Advancements in
    Biologging. Science Integrative and Comparative Biology 45:305-320
 9. Kooyman, G L (2004) Genesis and evolution of biologging devices: l963-2002
    58:522
10. Vapnik, V and Levin, E and Le Cun, Y (1994) Measuring the VC-dimension of a
    learning machine. Neural Computation 6:851876
11. Waibel, A (1989) Modular construction of time-delay neural networks for speech
    recognition. Neural computation 1:3946
12. Jaeger, H (2002) Adaptive nonlinear system identification with echo state net-
    works. In Advances in neural information processing systems 593600
13. Ozturk, M and Xu, D and Prı́ncipe, J C. (2007) Analysis and design of echo state
    networks. Neural Computation, 19:111138
14. Botı́a, J A and Garcı́a, J A À and Fujinami, K and Barsocchi, P and Riedel, T
    (2013) Evaluating aal systems through competitive benchmarking.
15. LeCun, Y and Bengio, Y (1995) Convolutional networks for images, speech, and
    time series. The handbook of brain theory and neural networks 3361:255258
16. Álvarez-Garcı́a, Juan Antonio and Barsocchi, Paolo and Chessa, Stefano and Salvi,
    Dario (2013) Evaluation of localization and activity recognition systems for am-
    bient assisted living: The experience of the 2012 EvAAL competition. Journal of
    Ambient Intelligence and Smart Environments 5:1 119132
17. Palumbo, Filippo and Barsocchi, Paolo and Gallicchio, Claudio and Chessa, Ste-
    fano and Micheli, Alessio (2013) Multisensor data fusion for activity recognition
    based on reservoir computing. International Competition on Evaluating AAL Sys-
    tems through Competitive Benchmarking 2435
18. Moré, Jorge J (1978) The Levenberg-Marquardt algorithm: implementation and
    theory. Numerical analysis 105116
19. Riedmiller, Martin (1994) Advanced supervised learning in multi-layer percep-
    tronsfrom backpropagation to adaptive learning algorithms. Computer Standards
    & Interfaces 16:3:265278