Population Age classification based on subject’s
physiological responses
Francesca Gasparini1 , Alessandra Grossi1 and Stefania Bandini1,2

1
    Department of Computer Science, Systems and Communications, University of Milano - Bicocca, Italy
2
    RCAST - Research Center for Advanced Science & Technology, The University of Tokyo


                                         Abstract
                                         In this work we rely on physiological signals as honest indicator of people’s instinctive behavior or
                                         emotions. Benefiting from the fact that these signals can be easily acquired from wearable devices, we
                                         here analyze the ability of these data to classify not only different human activities but also individuals’
                                         age. We consider Photoplethysmography (PPG) and Galvanic Skin Response (GSR) belonging to a dataset
                                         collected in a real laboratory environment at the University of Tokyo. In the experiment a population of
                                         Japanese young adults and a population of Japanese elderly people were involved and performed four
                                         different tasks: Reading, Comprehension, Audio Listening and Math Calculation. Four binary classifiers
                                         have been considered, one for each of the experiment tasks, to classify the population age. Different
                                         classification models have been tested (SVM, CART and XgBoost) with a LOSO validation strategy,
                                         obtaining classification accuracy between 69% and 78%. The task in which the two groups are most easily
                                         distinguishable is that of mathematical calculation. Finally, we also perform a multi-class classification
                                         considering age and tasks, for a total of six classes: Math Calculation, Reading and Audio Listening for
                                         each subject group (young adult and elderly), obtaining an overall good performance.


                                         Keywords
                                         Physiological Signals, Active Ageing, Photopletysmograpy, Galvanic Skin Response, Elettrodermal
                                         Activity, Machine Learning


1. Introduction
In daily life, the concept of the Internet of Things plays an increasingly important role [1]. The
advance in communication and computing technologies, as well as the reduction in sensor and
electronic component cost, led to the creation of interconnected systems where also everyday
objects, such as mobile phones, actuators, appliances or smart devices, are capable of sending
and receiving data over the network [2]. In this Smart Environment, the connected devices must
be able to acquire knowledge, understand how to apply it and working collaboratively to make

Italian Workshop on Artificial Intelligence for an Ageing Society (AIxAS 2021), November 29th, 2021
Envelope-Open francesca.gasparini@unimib.it (F. Gasparini); a.grossi6@campus.unimib.it (A. Grossi);
stefania.bandini@unimib.it (S. Bandini)
Orcid 0000-0002-6279-6660 (F. Gasparini); 0000-0003-1308-8497 (A. Grossi); 0000-0002-7056-0543 (S. Bandini)
                                       © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
human life more comfortable [3]. In future scenarios, urban and home environments are destined
to become increasingly linked to technological aspects, for example with the introduction of
self-driving vehicles. In this context, thus, the analysis of the behaviors and emotions of people
during their daily activities, interacting with the environment, may bring to the definition
of systems able to receive feedback from users and consequently adapt. In particular, these
systems may help to define environments more friendly to vulnerable citizens, such as the
elderly and people with disabilities [4]. In urban environments, for example, technologies
capable of receiving feedback from users could be involved in the definition of self-driving
cars able to adapt their dynamic behaviours to the safety perception of different categories of
pedestrians [5] or in the realization of traffic lights able to adapt their waiting time according to
the presence or absence of people with impaired mobility.
To implement such systems, thus, an essential aspect consists in profiling and recognizing the
categories of individuals with which these systems could interact. In this respect, the aging
of the population is a relevant factor that should be taken into account. It has been observed
that subjects of different ages react differently to particular stimuli both from emotional [6]
and behavioral point of view [7] [8]. For instance, the elderly appear less reactive than young
adults in response to audio and visual stimuli [9], as well as they appear slower in carrying out
cognitive tasks as mouse pointing [10] or in driving ability [11]. In addition, also concerning
walkability, the two populations appear different. In particular, in [7] it is reported how elderly
tend to behave more cautiously when they have to face an obstacle, passing it only when they
feel save.
For this reason, in a world where the average age of the population is destined to increase
over the years [12, 13], the definition of systems capable of automatically recognizing the
age of the users and behave consequently is becoming a primary issue. In this context, a
fruitful research area concerns the use of person’s physiological responses such as heartbeat
(Photopletismography or Electrocardiography), sweat glands activity (Electrodermal Activity)
or electrical activity on the scalp (Electroencephalography). These signals are not voluntary and
uncontrolled responses of our Autonomous Nervous System (ANS) and can thus be considered
as reliable and honest indicators of the subject’s instinctive behavior or emotions. Besides,
several studies underlined how these signals change with the increase of age. For example, in
[14] it is shown how the shape of the Photoplethysmography (PPG) is affected by the subject’s
age. In particular, the PPG signals of elderly appear as more rounded and characterized by the
disappearance of the dicrotic notch and the inflection point. Even concerning Electrodermal
Activity (EDA) and Electroencephalography (EEG), age-related differences are reported. For
example, from the analysis of EDA signal it emerges how, in general, some signal characteristics
appear affected by the subject age with a lesser skin response in elderly with respect of young
adults and middle-aged adults [15].
Finally, people’s physiological signals are also involved in analysis related to compare young
and elderly during specific tasks. In this context, the different behavior of the young and
elderly during driving tasks has been studied through EEG signals [8]. This research shows
how high drive performance is associated, in the elderly, with increased mental effort and
fatigue while, in the young, with a higher focus on the task and lesser attention to distractors.
In [16], instead, an emotion classification task involving two populations of different ages is
reported. In this analysis, two physiological signals are taken into account: Galvanic Skin
Figure 1: Example of signals acquisition: (Left) Math Calculation,   Figure 2: Sensors adopted
(Right) relaxing audio listening


Response (GSR) and Heart-Beat Variability (HRV). In most of the studies presented, however,
the physiological signals have been used to analyze the general individual’s behaviour in a
specific task (as driving) or to distinguish the emotion of different ageing people. Besides, rarely
the physiological signals have been used in classification tasks to recognize people of different
ages. Our study is developed in this context to automatically discriminate between young adults
and the elderly while performing different tasks. To this end, starting from the dataset described
in section 2, two types of analysis have been performed. In the first part of our study, we tested
several binary classifiers in order to recognize the two populations while performing a specific
cognitive activity. In the second part, instead, a multi-classification task has been carried out
to define a classifier able to recognize both the age of the subject involved and the activity
performed. In this work, different classifiers and feature sets have been tested. In particular, we
focus our attention on the analysis of PPG and GSR signals. The preprocessing of these signals
is reported in section 3 while the features extracted are described in section 4. In section 5, the
classification settings and the results of the two analyses are presented and discussed. Finally,
conclusions are drawn in section 6.


2. Experimental Protocol and Data Collection
All the analysis were performed on a dataset collected in a real laboratory environment at the
University of Tokyo and already partially described in [17]. In the experiment, two different
groups of subjects were involved: a population of young adults, composed of 16 Japanese
master and PhD students, (average age = 24.7 years, standard deviation = 3.3, 4 women), and
a population of Japanese elderly people (retired), 20 subjects, (average age of 65.15, standard
deviation = 2.7, 10 women). All the subjects were healthy and no mental or heart diseases were
reported.
All the participants performed the same tasks defined by the same experimental protocol, lasting
about 30 minutes, and described below:
    • 3 minutes of Subject’s emotional profiling carried out filling the STAI questionnaires
      [18].
    • 6 minutes of Reading and Comprehension tasks. Two different texts were proposed: a
      Fairy-tale (“The Wolf and the Seven Little Kids”) and a philosophy text (“Kant’s Critique
      of Pure Reason”). The subjects had 2 minutes to read each text and 1 minute to answer
      self assessment and Reading Comprehension questions.
    • 15 minutes of Audio Listening and Math Calculation tasks composed of six repetitions
      of a two steps sequence consisting of:
         1. 2 minutes of audio listening in which the relaxation were induced by natural and
            real life sounds (Figure 1 right).
         2. 30 seconds of mental arithmetic calculations like sums, subtractions and multiplica-
            tions (Figure 1 left).
      The audio tracks and the calculations proposed were the same for each subject but changed
      according to the iteration.
Between each couple of tasks, a period of resting time (Baseline acquisition) of about 1 minute
was acquired.
During the whole experiment, the PPG and the GSR data of each participant were collected
using the Shimmer3 GSR+ Unit [19], with a sampling frequency of 128 Hz. An example of the
adopted sensors are showed in Figure 2.


3. Signal pre-processing
With the aim of remove acquisition artifacts and noise, both PPG and GSR signals have been
pre-processed using a wavelet multiresolution denoising method similar to the one described in
[20]. In particular, the PPG raw signal of each subject has been divided into frequency sub-bands
using a Stationary Wavelet Transform (SWT) [21] with mother wavelet Fejer-Korovkin22 [22]
and four levels of decomposition. A Soft Thresholding has been then applied to the detail coeffi-
cients of each sub-band. The threshold adopted for this purpose was the Universal Threshold
calculated by the formula 𝑡𝑘 = √2𝑙𝑜𝑔(𝑁𝑗 )), where 𝑁𝑗 is the length of the jth wavelet coefficient
[23].
Likewise, in literature, the use of wavelet-based denoising strategies, in particular those based
on SWT, proved to be very efficient in removing noise from the GSR signal [24]. Therefore a
multiresolution denoising strategy based on SWT has been also used to pre-process the GSR
signals. In this second case, however, a Coiflet3 mother wavelet with 7 levels of decomposition
has been employed for the decomposition. Besides, the threshold used in the Soft Thresholding
was fixed and determined trying to yield minimum of the maximum mean square error over a
given set of functions (Minimax thresholding method [25]).
Since in our study the SWT is implemented with the algorithm a-trous [26], a preliminary
operation of replicate padding has been applied to both the analyzed signals in order to obtain
a length divisible by 2𝑙𝑒𝑣𝑒𝑙 [21]. In our study the value of “level” is different according to the
signal considered: 4 in the case of PPG and 7 for GSR.
Table 1
Number of instances for each tasks before (first 4 columns) and after data augmentation (last 4 columns).
First row young adult, second row elderly

                         N° signals acquired                            N°signal after data augmentation
             Math          Audio     Reading   Comprehension     Math           Audio     Reading   Comprehension
           Calculation   Listening                             Calculation    Listening

 Young         96           96         32           32             96            96         96           64

 Elderly      120          120         40           40            120           120        120           80


In order to reduce both inter and intra subjects variability, the denoising task has been followed
by a normalization phase. In case of PPG, a two-step normalization has been applied. Firstly,
the amplitude of each signal has been normalized applying z-score operation. Then a subject’s
heartbeat normalization method is applied, considering the heart beat rate of the baseline. In the
case of GSR signals, instead, only an amplitude normalization has been performed. In particular,
a z-score normalization has been applied to the whole signal before splitting it into different
experimental trails.
Once segmented, the number of instances for each task appeared unbalanced as shown in Table 1.
In particular, there are less instances in the case of reading and comprehension tasks. Therefore,
a data augmentation strategy has been applied in order to create more balanced groups. The
signals related to reading task were divided in non-overlapping segments of 40 seconds while
the Reading Comprehension signals were segmented into two parts of equal length. The last
two columns of the Table 1 show the new cardinality after the data augmentation procedure.


4. Features Extracted
After the pre-processing, each of the resulting segments was analyzed in order to identify
characteristics that can be significant in discriminating young subjects from elderly ones. For
this purpose, seven time-domain features have been extracted from the PPG signal:
    • Four statistical features (Minimum, Maximum, Mean and Standard Deviation of the
      signal);
    • Peak Rate, representing the mean number of peaks per second;
    • Inter Beat Interval (IBI), representing the mean distance between two peaks in a row;
    • Root Mean Square of Successive Distance (RMSSD) representing the variance of the
      distance between two peaks [27].
In the GSR, two types of signal components are usually taken into account during the feature
extraction procedure: the Phasic component, related to rapid changes in skin conduction (Skin
Conductance Responses) due to external stimuli or spontaneous responses, and the Tonic
component related, instead, to the slow change in the signal and representative of the general
arousal or stress level. In our analysis, features from both phasic and tonic components were
considered. To this aim, the GSR signals were first decomposed into these two components
applying the Cvx algorithm [28]. Different time domain features have been extracted, according
to the signal component, as listed below:
    • Signal not decomposed: four statistical features (Maximum, Minimum, Mean and Stan-
      dard Deviation of the signal)
    • Phasic Component: seven statistical and peak related features:
         – Maximum and Minimum
         – Peak Rate, representing the mean number of peaks per second
         – Peak Area and Peak Area per Second, representing respectively the mean area
            under the peaks and the mean area under peaks evaluated per second.
         – Peak Height representing the mean height of the peak detected on the phasic
            component.
         – Rise Time (or also Onset-to-Peak Time) defined as the mean number of samples
            from the onset of the skin conductance response to the top of the peak.[29, 30]
    • Tonic Component: the Regression Coefficient has been considered as representative of
      the signal slope.
Finally, the features have been normalized by z-scoring before using them as input to classi-
fiers.


5. Results and discussion
5.1. Classification Setting
In order to determine if it is possible to recognize signals acquired from young adults with
respect to signals acquired from elderly, four binary classification tasks have been performed,
one for each activity in the dataset (reading, comprehension, math calculations and audio
listening). For each task, the signals collected on young adults define the instances of the first
class while the signals collected on elderly subjects characterize the elements of the second
class. In addition, in all the experiments performed, three well-known classifiers have been
involved: Classification and Regression Tree (Cart)[31], Support Vector Machine (SVM)[32] and
gradient boosted decision trees algorithm implemented as XgBoost[33]. In the case of SVM, three
different kernel have been tested: linear (SVM-Linear), gaussian (SVM-Gauss) and polynomial
cubic (SVM-Cubic) kernel. It is important to underline that all the selected classifiers perform in
general well even in the case of moderately unbalanced classes[34, 35]. This makes their use
suitable in our datasets.
Finally, a Leave-One-Subject-Out procedure has been applied to evaluate the performance of
the trained classifiers. In this method, during each iteration, all the signals of one subject
were used as test set while the signals of the remaining subjects were used to train the model.
Several evaluation metrics including accuracy, F1-score and the weighted F1-score [36] have
been computed to evaluate the performance of the different classification tasks. In particular,
the weighted F1-score (W-F1) is computed as the weighted mean of per-class F1 scores on the
base of the following formula:
                                           𝑚
                                               𝑁
                                   W-F1 = ∑ 𝑐 ∗ 𝐹 1𝑗                                     (1)
                                          𝑗=1 𝑁𝑡𝑜𝑡
where m is the number of classes considered (here 2), 𝑁𝑐 is the number of elements in class “c”,
𝑁𝑡𝑜𝑡 is the total number of elements analyzed and 𝐹 1𝑗 is the F1-score for the 𝑗𝑡ℎ-class.


5.2. Classification Results
Four different binary classifications, one for each activity, have been performed to recognizing
young adults’ signals from elderly ones. The classification performance obtained are summarized
in Tables 2a (Reading), 2b (Comprehension), 2c (Math Calculation) and 2d (Audio Listening).
The performance metrics (accuracy, F1-score and weighted F1-score) are reported, considering
five classification models and varying the set of features used (PPG, GSR or joining PPG and
GSR). To reduce bias in the results, a Leave One Subject Out (LOSO) strategy has been adopted
for all the classifiers.
The best performance for each task is obtained using both GSR and PPG features. In particular,
the best of all results has been observed in the Math Calculation task, where an accuracy of 78%
has been reached using a SVM with a linear kernel. In this case, the two classes (young and
elderly) are well discriminable with F1-score values greater than 70%. On the other hand, the
two populations are less distinguishable in the case of audio listening. In this case, XgBoost with
features concatenated from both signals allowed to reach an accuracy of 69%. Considering the
classification performance that can be reached using only one of the two physiological signals,
the PPG seems in general to be more useful in discriminating the two populations. In fact, in
almost all the studies carried out, features related to the subjects heartbeat outperformed the
results generated using features related to skin conductance. Moreover, we recall that all the PPG
signals were normalized not only with respect to the amplitude but also with respect to subject’s
heartbeat during baseline. This procedure reduces the inter-subject heterogeneity making the
signals subject-independent. Finally, another general consideration regards the classifiers that
allowed to reach the best performances. In all the conducted experiments, the highest accuracy
has been achieved using XgBoost or SVM with Linear Kernel while the worst performances
have been obtained using CART with accuracies around 55%. The results described so far have
shown, in general, positive performance in recognizing young adults from elderly when a given
task is analyzed. To discriminate not only the population group with respect to age but also
the task performed, a multi-class classification analysis has been carried out. In this latter, six
classes have been considered: three activities (Math Calculation, Reading and Audio Listening)
for the two population groups. The Comprehension task has been excluded by the analysis due
to its limited number of instances compared to the others. To train the different classifiers, the
features extracted from both the types of physiological signals have been employed as suggested
by the analyses of the binary classifiers.
In Table 3 the results of this multi-class analysis are summarized. As in binary classification,
the best performance has been reached using the SVM with Linear kernel. This model allowed
to reach an accuracy of 62%, outperforming the accuracy of the other classifiers.
Furthermore, in order to better understand the misclassification errors, an in-depth analysis of
the confusion matrix has been performed. From this matrix, shown in Table 4, it emerges that
the classes that are better recognized are those related to Math Calculation, while on the opposite
the lower performances are obtained in the Audio listening tasks. In case of misclassification,
the algorithm tends to well classify the task performed but to misunderstand the population
group.

Table 2
Performance of the binary classifiers in discriminating Young adults (Yng) from Elderly (Eld) in the
different tasks analyzed, varying the feature set and adopting a LOSO validation strategy. Three
performance metrics are reported: accuracy, F1-score (F1) and Weighted F1-score (W-F1). In each table,
the accuracies in bold represent the best performances achieved for each feature set considered, while
in red is highlighted the best accuracy at all.
                            (a) Binary classifiers performance for the Reading Task
                           PPG Features                        GSR Features                  PPG and GSR Features
  Classifier    Accuracy   Yng F1   Eld F1   W-F1   Accuracy   Yng F1   Eld F1   W-F1   Accuracy   Yng F1   Eld F1   W-F1

 SVM - Linear     63%       0,53     0,69    62%      63%       0,54     0,70    63%      75%       0,71     0,77    74%
 SVM - Cubic      64%       0,59     0,68    64%      58%       0,53     0,62    58%      64%       0,62     0,66    64%
 SVM - Gauss      66%       0,61     0,70    66%      62%       0,54     0,68    62%      72%       0,67     0,76    72%
 Cart             56%       0,54     0,59    57%      59%       0,52     0,64    59%      65%       0,60     0,69    65%
 XgBoost          70%       0,68     0,71    70%      65%       0,59     0,69    65%      71%       0,68     0,74    71%

                        (b) Binary classifiers performance for the Comprehension Task
                           PPG Features                        GSR Features                  PPG and GSR Features
  Classifier    Accuracy   Yng F1   Eld F1   W-F1   Accuracy   Yng F1   Eld F1   W-F1   Accuracy   Yng F1   Eld F1   W-F1

 SVM - Linear     69%       0,55     0,76    67%      66%       0,63     0,69    66%      69%       0,67     0,72    70%
 SVM - Cubic      62%       0,60     0,64    62%      58%       0,53     0,61    58%      61%       0,56     0,65    61%
 SVM - Gauss      65%       0,41     0,75    60%      65%       0,59     0,70    65%      61%       0,56     0,65    61%
 Cart             59%       0,56     0,62    59%      63%       0,57     0,67    62%      58%       0,52     0,63    58%
 XgBoost          67%       0,64     0,70    67%      61%       0,55     0,66    61%      74%       0,69     0,77    74%

                        (c) Binary classifiers performance for the Math Calculation Task
                           PPG Features                        GSR Features                  PPG and GSR Features
  Classifier    Accuracy   Yng F1   Eld F1   W-F1   Accuracy   Yng F1   Eld F1   W-F1   Accuracy   Yng F1   Eld F1   W-F1

 SVM - Linear     74%       0,68     0,78    73%      64%       0,59     0,68    64%      78%       0,75     0,81    78%
 SVM - Cubic      70%       0,68     0,72    70%      66%       0,61     0,70    66%      69%       0,64     0,72    68%
 SVM - Gauss      71%       0,65     0,75    71%      62%       0,55     0,67    62%      71%       0,68     0,74    71%
 Cart             73%       0,68     0,76    73%      54%       0,49     0,59    54%      67%       0,63     0,70    67%
 XgBoost          72%       0,69     0,75    72%      59%       0,50     0,65    58%      70%       0,67     0,73    70%

                             (d) Binary classifiers performance for the Audio Task
                           PPG Features                        GSR Features                  PPG and GSR Features
  Classifier    Accuracy   Yng F1   Eld F1   W-F1   Accuracy   Yng F1   Eld F1   W-F1   Accuracy   Yng F1   Eld F1   W-F1

 SVM - Linear     60%       0,44     0,69    58%      67%       0,60     0,72    67%      65%       0,59     0,69    65%
 SVM - Cubic      64%       0,62     0,65    64%      58%       0,51     0,63    58%      64%       0,58     0,69    64%
 SVM - Gauss      56%       0,39     0,66    54%      59%       0,49     0,66    58%      65%       0,59     0,69    65%
 Cart             50%       0,46     0,52    50%      57%       0,54     0,61    58%      62%       0,55     0,66    61%
 XgBoost          67%       0,64     0,69    67%      61%       0,53     0,67    61%      69%       0,63     0,74    69%
Table 3
Performance of the multi-class recognition task obtained concatenating PPG and GSR features and
adopting a LOSO validation strategy. Six classes are considered corresponding to three tasks: Math
Calculation (MC), Reading (READ) and Audio Listening (AUDIO)) and two population groups: Young
Adult (Y) and Elderly (E)). The classifiers involved in the different analyses are reported on the rows. The
metrics used to evaluate the performance are accuracy, F1-score (F1) and Weighted F1-score (W-F1)).
                                                 PPG and GSR Features
        Classifier         Accuracy    MC_Y      READ_Y AUDIO_Y    MC_E       READ_E AUDIO_E     W-F1
                                       F1        F1     F1         F1         F1     F1

      SVM - Linear           62%        0,62      0,46     0,55     0,73       0,72     0,59         62%
      SVM - Cubic            58%        0,56      0,52     0,56     0,67       0,63     0,55         59%
      SVM - Gauss            57%        0,50      0,50     0,51     0,63       0,68     0,58         57%
      Cart                   49%        0,46      0,35     0,37     0,61       0,65     0,47         49%
      XgBoost                58%        0,60      0,43     0,49     0,67       0,70     0,57         58%


Table 4
Confusion Matrix of SVM classifier with Linear kernel for Multi-class recognition task. Six classes are
considered, one for each couple of Cognitive Load Task (Math Calculation (MC), Reading (READ) and
Audio Listening (AUDIO)) and subject aging (Young Adult (Y) and Elderly (E)). The values in bold are
the main diagonal elements and represent the cases where the class predicted by the classifier and true
class agree.
                                                           Predicted Class
                                   MC_Y        READ_Y    AUDIO_Y    MC_E       READ_E    AUDIO_E
                        MC_Y          56%        7%         3%          19%      10%            4%
                       READ_Y          4%       46%         2%           5%      26%           17%
          True class


                       AUDIO_Y         0%        6%        50%           0%       4%           40%
                        MC_E          14%        7%         1%          71%       1%            7%
                       READ_E          2%       15%         2%           3%      76%            3%
                       AUDIO_E         2%       12%        18%           2%       2%           66%


6. Conclusion
In their daily life, people are subjected to different stimuli that could affect their behavior
and emotions. In particular, the age of a person seems a relevant factor in the definition of
how an individual responds to specific stimuli. In this paper different binary and multi-class
classification tasks have proved that physiological signals permit to well discriminate between
young adults and elderly, while performing different actions. PPG seems in general to be
more useful in all the classification tasks, however the best results are achieved considering
both PPG and GSR. These results, together with the increasing availability and reliability of
wearable devices, are promising in the perspective of the definition of systems that, interacting
with subjects, can recognize their emotions and behaviors as well as their age group, and
consequently adapt. Concerning this topic, several factors like different cultural aspects or daily
habits could be taken into account in future analysis to create systems able to interact with the
largest possible number of heterogeneous users. Furthermore, the age of the individuals could
be also used as an additional input variable, together with other parameters like the subject’s
health, lifestyle or nutritional habits, in the definition of accurate measures of “physiological
age” that could be used by industrial designers and product developers to guide their work in
the development of appropriate technology able to provide efficient and personalized assistance
to individuals of different ages and needed.


Acknowledgments
This research is partially supported by the FONDAZIONE CARIPLO “LONGEVICITY-Social
Inclusion for the Elderly through Walkability” (Ref. 2017-0938) and by the Japan Society for the
Promotion of Science (Ref. L19513). We want to give our thanks to Prof. Katsuhiro Nishinari
and his staff, in particular Kenichiro Shimura and Daichi Yanagisawa for their indispensable
support during the experiment held at RCAST - The University of Tokyo.


References
 [1] L. Atzori, A. Iera, G. Morabito, The internet of things: A survey, Computer networks 54
     (2010) 2787–2805.
 [2] S. Majumder, E. Aghayi, M. Noferesti, H. Memarzadeh-Tehran, T. Mondal, Z. Pang, M. J.
     Deen, Smart homes for elderly healthcare—recent advances and research challenges,
     Sensors 17 (2017) 2496.
 [3] E. Ahmed, I. Yaqoob, A. Gani, M. Imran, M. Guizani, Internet-of-things-based smart
     environments: state of the art, taxonomy, and open research challenges, IEEE Wireless
     Communications 23 (2016) 10–16.
 [4] A. Rasouli, J. K. Tsotsos, Autonomous vehicles that interact with pedestrians: A survey
     of theory and practice, IEEE transactions on intelligent transportation systems 21 (2019)
     900–918.
 [5] S. Bandini, S. Fontana, F. Gasparini, D. Sorrenti, Interaction autonomous vehicle–
     pedestrian: Dynamic vehicle behaviour as a function of subjective safety perception, in:
     IEEE International Conference on Robot & Human Interactive Communication ROMAN-
     2020, 2020.
 [6] L. Fernández-Aguilar, J. Ricarte, L. Ros, J. M. Latorre, Emotional differences in young and
     older adults: Films as mood induction procedure, Frontiers in psychology 9 (2018) 1110.
 [7] F. Gasparini, A. Grossi, K. Nishinari, S. Bandini, Age-related walkability assessment: A
     preliminary study based on the emg, in: International Conference of the Italian Association
     for Artificial Intelligence, Springer, 2020, pp. 423–438.
 [8] S. Getzmann, S. Arnau, M. Karthaus, J. E. Reiser, E. Wascher, Age-related differences in
     pro-active driving behavior revealed by eeg measures, Frontiers in human neuroscience
     12 (2018) 321.
 [9] S. Doiphode Rupali, S. Vinchurkar Aruna, A comparative study of auditory and visual
     reaction time in young and elderly males, International Journal of Health Sciences and
     Research 10 (2020) 333–337.
[10] M. Hertzum, K. Hornbæk, How age affects pointing with mouse and touchpad: A compar-
     ison of young, adult, and elderly users, Intl. Journal of Human–Computer Interaction 26
     (2010) 703–734.
[11] H.-k. Shin, H.-C. Lee, Characteristics of driving reaction time of elderly drivers in the
     brake pedal task, Journal of Physical Therapy Science 24 (2012) 567–570.
[12] U. C. Bureau, 2017 national population projections datasets, 2017.
[13] M.-A. Wren, C. Keegan, B. Walsh, A. Bergin, J. Eighan, A. Brick, S. Connolly, W. Dorothy,
     J. Banks, Projections of demand for healthcare in Ireland, 2015-2030: first report from the
     Hippocrates model. Esri research series number 67 October 2017, 2017.
[14] Q. Yousef, M. Reaz, M. A. M. Ali, The analysis of ppg morphology: investigating the effects
     of aging on arterial compliance, Measurement Science Review 12 (2012) 266.
[15] D. S. Bari, H. Y. Yacoob Aldosky, Ø. G. Martinsen, Simultaneous measurement of electro-
     dermal activity components correlated with age-related differences, Journal of Biological
     Physics 46 (2020) 177–188.
[16] L. Fernández-Aguilar, A. Martínez-Rodrigo, J. Moncho-Bogani, A. Fernández-Caballero,
     J. M. Latorre, Emotion detection in aging adults through continuous monitoring of electro-
     dermal activity and heart-rate variability, in: International Work-Conference on the
     Interplay Between Natural and Artificial Computation, Springer, 2019, pp. 252–261.
[17] F. Gasparini, A. Grossi, S. Bandini, A deep learning approach to recognize cognitive load
     using ppg signals, in: The 14th PErvasive Technologies Related to Assistive Environments
     Conference, 2021, pp. 489–495.
[18] C. D. Spielberger, State-trait anxiety inventory for adults (1983).
[19] A. Burns, E. P. Doheny, B. R. Greene, T. Foran, D. Leahy, K. O’Donovan, M. J. McGrath,
     Shimmer™: an extensible platform for physiological signal capture, in: 2010 annual
     international conference of the IEEE engineering in medicine and biology, IEEE, 2010, pp.
     3759–3762.
[20] A. Biswas, M. S. Roy, R. Gupta, Motion artifact reduction from finger photoplethysmogram
     using discrete wavelet transform, in: Recent Trends in Signal and Image Processing,
     Springer, 2019, pp. 89–98.
[21] G. P. Nason, B. W. Silverman, The stationary wavelet transform and some statistical
     applications, in: Wavelets and statistics, Springer, 1995, pp. 281–299.
[22] W. Li, Wavelets for electrocardiogram: overview and taxonomy, IEEE Access 7 (2018)
     25627–25649.
[23] D. L. Donoho, J. M. Johnstone, Ideal spatial adaptation by wavelet shrinkage, biometrika
     81 (1994) 425–455.
[24] W. Chen, N. Jaques, S. Taylor, A. Sano, S. Fedor, R. W. Picard, Wavelet-based motion artifact
     removal for electrodermal activity, in: 2015 37th Annual International Conference of the
     IEEE Engineering in Medicine and Biology Society (EMBC), IEEE, 2015, pp. 6223–6226.
[25] C. M. Stein, Estimation of the mean of a multivariate normal distribution, The annals of
     Statistics (1981) 1135–1151.
[26] M. Holschneider, R. Kronland-Martinet, J. Morlet, P. Tchamitchian, A real-time algorithm
     for signal analysis with the help of the wavelet transform, in: Wavelets, Springer, 1990, pp.
     286–297.
[27] P. K. Stein, M. S. Bosner, R. E. Kleiger, B. M. Conger, Heart rate variability: a measure of
     cardiac autonomic tone, American heart journal 127 (1994) 1376–1381.
[28] A. Greco, G. Valenza, A. Lanata, E. P. Scilingo, L. Citi, cvxeda: A convex optimization ap-
     proach to electrodermal activity processing, IEEE Transactions on Biomedical Engineering
     63 (2015) 797–804.
[29] J. J. Braithwaite, D. G. Watson, R. Jones, M. Rowe, A guide for analysing electroder-
     mal activity (eda) & skin conductance responses (scrs) for psychological experiments,
     Psychophysiology 49 (2013) 1017–1034.
[30] W. Boucsein, Principles of electrodermal phenomena, in: Electrodermal activity, Springer,
     2012, pp. 1–86.
[31] L. Breiman, J. Friedman, R. A. Olshen, C. J. Stone, Classification and regression trees
     chapman & hall, New York (1984).
[32] C. Cortes, V. Vapnik, Support-vector networks, Machine learning 20 (1995) 273–297.
[33] T. Chen, C. Guestrin, Xgboost: A scalable tree boosting system, in: Proceedings of the
     22nd acm sigkdd international conference on knowledge discovery and data mining, 2016,
     pp. 785–794.
[34] R. D. Abdu-Aljabar, O. A. Awad, A comparative analysis study of lung cancer detection and
     relapse prediction using xgboost classifier, in: IOP Conference Series: Materials Science
     and Engineering, volume 1076, IOP Publishing, 2021, p. 012048.
[35] R. Akbani, S. Kwek, N. Japkowicz, Applying support vector machines to imbalanced
     datasets, in: European conference on machine learning, Springer, 2004, pp. 39–50.
[36] N. Y. Hammerla, S. Halloran, T. Plötz, Deep, convolutional, and recurrent models for
     human activity recognition using wearables, arXiv preprint arXiv:1604.08880 (2016).