Experimental Analysis of Deep Echo State Networks for Ambient Assisted Living Claudio Gallicchio ( ) and Alessio Micheli Department of Computer Science, University of Pisa, Largo B. Pontecorvo 3, Pisa, Italy gallicch@di.unipi.it, micheli@di.unipi.it Abstract. The Reservoir Computing (RC) paradigm represents a state- of-the-art methodology for efficient building of recurrent neural networks, which in the last years has proved effective in learning real-world tempo- ral tasks from streams of sensorial data in the Ambient Assisted Living (AAL) domain. Recently, the study of RC networks has been extended to the case of deep architectures, with the introduction of the deep Echo State Network (DeepESN) model. Featured by a layered composition of recurrent units, DeepESNs are inherently able to develop a hierar- chically structured representation of temporal information, at the same time preserving the RC characterization of training efficiency. In this paper, we discuss the introduction of the DeepESN approach in the field of AAL. To this aim, we perform a comparative experimental analysis on two real-world benchmark datasets related to inferring the user’s behavior from data streams gathered from the nodes of a wireless sensor network. Results show that DeepESNs outperform standard RC networks with shallow architecture, suggesting a multiple-time scales na- ture of the involved temporal data and pointing out the great potentiality of the proposed approach in the AAL field. Keywords: Deep Learning, Reservoir Computing, Deep Echo State Net- work, Ambient Assisted Living, Human Activity Recognition 1 Introduction Being able to recognize the behavior of humans in their every-day environments is one of the key objectives of Ambient Assisted Living (AAL) applications. This ability can be indeed exploited in diverse applicative contexts aiming at improv- ing the quality of life of older people, e.g. by monitoring the regularity of their activities, enhancing the degree of personalization of smart home services based on their habits, or anticipating their needs in the place where they live or work. Among the possible solutions, the use of wireless sensor networks (WSN) [10] as a mean to gather relevant data for the purpose of modeling the user’s behavior turns out to be a reasonable trade-off between intrusiveness, user acceptance and quality of the data. In typical AAL scenarios, vast amount of temporal data is generated through the interaction of humans with the sensors onboard the nodes of the deployed WSN. In this context, the interest in the adoption of Machine Learning method- ologies to discover relevant patterns from streams of sensorial information is constantly increasing [34]. In particular, the class of Recurrent Neural Networks (RNNs) [33, 29] are recognized for their remarkable ability to effectively approach learning tasks characterized by a distinct sequential/temporal nature in presence of noisy and imprecise data, and it is therefore considered as particularly appro- priate to approach the difficulties of the learning problems occurring in the AAL applicative domains [37]. The Reservoir Computing (RC) [40, 35] paradigm and the Echo State Network (ESN) [31, 30] model represent a theoretically grounded methodology [17] for efficiently modeling and train RNNs. Characterized by a huge popularity in many applicative domains involving temporal data process- ing (see e.g. [36, 40]), the ESN approach has recently gained a great success in real-world AAL-related tasks. Examples of relevant applications in this con- text include indoor user context localization [9, 11, 5, 21], robot localization [14, 12], adaptive planning in personalized robotic applications [8], human gesture recognition [20], human activity recognition [37, 2] and health care monitoring for medical applications [7, 23, 3]. In addition to this, ESNs have been adopted as core learning methodology in recent European initiatives, such as the FP7 RUBICON1 (Robotic UBIquitous COgnitive Network) project [1, 13], and the FP7 DOREMI2 (Decrease of cOgnitive decline, malnutRition and sedEntariness by elderly empowerment in lifestyle Management and social Inclusion) project [38, 6]. Moreover, the ESN approach has also recently been investigated in per- spective of the realization of a learning service for the Internet of Things [4] and promises to greatly help in addressing the challenges involved by interaction between robotic devices and the IoT, in the so-called Internet of Robotic Things [39] framework. Recently, with the introduction of the DeepESN model in [27, 18], the study of hierarchically organized RC architectures is arousing an increasing interest. Keeping the extreme efficiency of training algorithms as in standard RC net- works, DeepESNs are capable of developing progressively more abstract repre- sentations of temporal information in the levels of the architecture, potentially allowing to naturally capture the structure of sequential data featured by mul- tiple time-scales. In this paper we investigate the introduction of the DeepESN approach in learning tasks from streams of sensorial data in the AAL domain. In particular, we provide an experimental analysis on two representative real-world benchmark datasets concerning the identification of user behavior in indoor environment based on data gathered from a small WSN. Specifically, a first dataset regards the prediction of user’s spatial context in typical office environments, while a second dataset targets a problem of Human Activity Recognition (HAR). In both the cases the analysis is conducted in comparison to standard shallow 1 EU FP7 RUBICON project (contract no. 269914), http://fp7rubicon.eu/ 2 EU FP7 DOREMI project (contract no. 611650), http://www.doremi-fp7.eu/ ESNs, in order to assess the impact of the proposed approach in this specific applicative context. The rest of this paper is organized as follows. In Section 2 we describe the major characterizations of the DeepESN model. The results of the experimental assessment on the two real-world benchmark datasets are provided and discussed in Section 3. Finally, Section 4 concludes the paper. 2 Deep Echo State Networks Within the randomized neural networks framework [16], the RC paradigm [40, 35] and the ESN model [31, 30, 17] represent the state-of-the-art RNN method- ology for efficiently learn in temporal domains. ESNs (and in general all types of RC networks) are based on a conceptual and practical distinction between a dynamical component, called reservoir, and a feed-forward readout tool, which is the only trained part of the network’s architecture. The reservoir implements a randomized (temporal) filter which has the role of embedding the history of input signals received by the network into a state space representation that pro- vides a contextual memory to the system for each new input. The idea is that if such temporal embedding is rich enough, then the representation of the prob- lem at hand in the state space is likely to be solved by a trained linear readout tool, and adaptation of the dynamical reservoir is not necessary. The well known characterization of training efficiency of the RC approach naturally stems from such consideration. Recently, the ESN approach has been extended in the direction of deep learn- ing with the introduction of the DeepESN model [27, 26, 18], in which the dy- namical reservoir part of the network architecture is hierarchically organized into layers. The study of DeepESNs has a twofold objective: on the one hand it aims at the development of efficiently trained deep neural network models for learn- ing in temporal domains, and on the other hand, by putting aside the aspects related to learning of the recurrent connections, it allows to highlight and stress the intrinsic properties of deep recurrent architectures. The analysis conducted so far has shown that the layered composition of RNN layers is indeed able to develop structured and rich representations of temporal information featured by multiple time-scales dynamics [27, 18], as also pointed out through investiga- tions on the short-term memory abilities [15], as well as by means of theoretical studies in the field of dynamical systems [26] and Lyapunov exponents [24, 25]. Under an even broader perspective, the study of DeepESNs also allows to open up interesting discussions about the true nature of deep learning for temporal data processing [28]. On the application side, DeepESNs proved effective in both synthetic and real-world cases, outperforming state-of-the-art results on several versions of the multiple superimposed oscillator task [28] and, recently, showing a very good predictive performance in a medical task related to diagnosis of Parkinson’s disease [22]. Additional details and an up-to-date overview on the advancements in the study of DeepESN can be found in [19]. The reservoir architecture of a DeepESN is composed of NL layers of recur- rent units, where here we assume that each layer has the same size, denoted by NR . Figure 1 gives a graphical illustration of the hierarchical reservoir organiza- tion in a DeepESN. As it can be seen, the DeepESN state computation follows the order in the architectural composition such that the first reservoir layer is fed by the external input, while each successive layer receives in input the output of the previous one. 1st layer 2nd layer NL -th layer Fig. 1. Layered reservoir architecture of a DeepESN. From a dynamical system point of view the reservoir of a DeepESN realizes a discrete-time non-linear dynamical system that is driven by the external input signal. More in detail, denoting by x(i) (t) ∈ RNR the reservoir state of layer i at time step t, we can represent the global state of the whole network at time t by x(t) = (x(1) (t), x(2) (t), . . . , x(NL ) (t)) ∈ RNL NR , which varies within a global state space that is the product of all the individual reservoir spaces. Under this viewpoint, the entire architecture of the stacked reservoirs is a dynamical system that is ruled by global state transition function F , which expresses how the new state of the network depends on the current input and on its previous state. Such global function F can be decomposed into its layer-wise components, i.e. F = (F (1) , F (2) , . . . , F (NL ) ), where each F (i) , for i = 1, 2, . . . , NL , denotes the state transition function implemented at layer i. In the following, we shall refer to the case of leaky integrator reservoir units [32] and we will omit the bias terms in equations for the ease of presentation. Denoting by u(t) ∈ RNU the external input at time step t, the state transition function of the first reservoir layer, i.e. F (1) , computes the state of the first reservoir layer as follows: x(1) (t) = F (1) (u(t), x(1) (t − 1)) (1) = (1 − a(1) )x(1) (t − 1) + a(1) tanh(Win u(t) + Ŵ(1) x(1) (t − 1)), where, a(1) ∈ [0, 1] is the leaking rate parameter of the first layer, Win ∈ RNR ×NU denotes the input weight matrix, Ŵ(1) ∈ RNR ×NR is the recurrent reservoir weight matrix for the first layer and tanh denotes the element-wise applied hyperbolic tangent activation function. Following the pipeline on the hi- erarchical reservoir architecture, each successive layer i > 1 is fed by the output of the previous layer at the same time step. Thereby, the state of the reservoir layer i at time step t is computed by the state transition function F (i) as follows: x(i) (t) = F (i) (x(i−1) (t), x(i) (t − 1)) (i) (2) = (1 − a(i) )x(i) (t − 1) + a(i) tanh(Wil x(i−1) (t) + Ŵ(i) x(i) (t − 1)), (i) where a(i) ∈ [0, 1] is the leaking rate parameter of the i-th layer, Wil ∈ RNR ×NR is the inter-layer reservoir weight matrix for layer i and Ŵ(i) ∈ RNR ×NR is the recurrent reservoir weight matrix for layer i. Note that in the above presented mathematical description of the model, whenever the reservoir comprises only one layer (i.e. for NL = 1), then a standard shallow ESN is obtained. As in the standard RC framework, the reservoir part is left untrained after being initialized under the conditions prescribed by the Echo State Property (ESP) [30, 41], which has been extended to the case of deep RC architectures in [26]. Typically, the reservoir is initialized according to the necessary condition for the ESP, which expresses a stability constraint on the developed network dynamics. Denoting by ρ(·) the spectral radius3 of its matrix argument, the necessary condition for the ESP [26] requires that: max ρ((1 − a(i) )I + a(i) Ŵ(i) ) < 1, (3) i=1,2,...,NL where I is the identity matrix of size NR × NR . Thereby, the initialization pro- cess of a DeepESN consists in a random initialization (e.g. from a uniform dis- tribution in [−1, 1]) of the reservoir in each layer, followed by a re-scaling to ensure that the condition in equation 3 is satisfied. Moreover, matrices Win (2) (N ) and Wil , . . . , Wil L are randomly initialized (e.g. from a uniform distribu- tion in [−1, 1]) and then re-scaled to a desired value of their 2-norms, where (i) (i) sin = kWin k2 act as input scaling parameter and sil = kWil k2 is the inter- layer scaling parameter of the i-th layer. The output of the DeepESN is computed by the readout tool by means of a linear combination of the reservoir states in the hierarchy. Although different alternatives are possible, here we consider the case in which the input for the readout layer is composed by the the global state of the network. Accordingly, at time step t the output of the DeepESN, denoted by y(t) ∈ RNY is computed as follows: y(t) = Wout x(t), (4) where Wout ∈ RNY ×NL NR is the readout weight matrix, which is adjusted on a training set, typically by direct methods such as Moore-Penrose pseudo-inversion or ridge regression [35]. For the purposes of our experimental analysis, we can distinguish two cases of learning tasks on temporal data. Specifically, in the case of sequence-to-sequence tasks an output element is required in correspondence of each input element, and equation 4 is applied to every time step of the computation. In the case of sequence-to-element tasks, the output is required only in correspondence of 3 The maximum among the eigenvalues in modulus. the last element of the input sequence, and equation 4 is applied only to the last time step (i.e. for each input sequence in correspondence of the last global state of the DeepESN). In case of binary classification tasks, the output of the network is discretized in {−1, 1} by applying the sign function to the output in equation 4, whereas in case of multi-class classification tasks the output class label is typically obtained by identifying the readout unit with the maximum activation. Further details on the DeepESN model can be found in recent literature [27, 26]. 3 Experiments In this section we present the results of the experimental assessment of Deep- ESNs on two real-world benchmark datasets in the context of AAL applications. Specifically, the adopted datasets are described in Section 3.1, while the experi- mental settings and results achieved by DeepESNs are comparatively discussed in Section 3.2. 3.1 Datasets In our experiments we took into consideration two real-world datasets related to the identification of human behavior in indoor environments. Both the datasets have been designed and developed within the activities and collaborations of our research group 4 as benchmarks in the perspective of being adopted for evaluation purposes of methods to be used in AAL domains. To this aim, both the datasets have been made freely available for download on the prominent and well-known UCI Machine Learning Repository5 . As regards the specific aims of this paper, the considered datasets are used as useful benchmarks for the assessment of the impact of deep RNN architectures for time-series processing on tasks from sensorial data. The major features of the adopted datasets are summarized in the following. Indoor Movement Forecasting. The first dataset that we take into consider- ation pertains to scenario of anticipating user movements in a real-world indoor office environment [9]. The prototypical environmental setting that is taken into consideration consists in a couple of rooms separated by a corridor. The user is walking in one of the two rooms and the goal is to anticipate whether she/he will change room or not, once arrived in a marker position (symmetrically placed in both the rooms). A WSN is placed in the environment, comprising 5 IRIS nodes, 4 of which act as anchors fixed nearby the corners of the rooms and the last one is a mobile worn by the user. The input data consists in the 4-dimensional stream of Received Signal Strength (RSS), sampled at the frequency of 8 Hz, exchanged between the mobile and the anchors during the user’s movements un- til the marker position is reached. The corresponding learning task is modeled 4 Computational Intelligence & Machine Learning (CIML) group, Department of Com- puter Science, University of Pisa. Website http://www.di.unipi.it/groups/ciml/ 5 https://archive.ics.uci.edu/ml/index.php as a sequence-to-element binary classification task, in which the target output associated to each input sequence is +1 for sequences that will lead to a room change, and -1 for those leading to a room preservation. The dataset contains information pertaining to 3 couples of rooms, gathered in a real-world setting as described in [9], for a total number of 314 sequences. Data pertaining to each couple of rooms is re-scaled in the range [−1, 1], individually for each RSS trace. Figure 2 shows examples of noisy input signals corresponding to both the cases of room change and room preservation. The Indoor Movement Forecasting dataset is freely available at the address https://archive.ics.uci.edu/ml/ datasets/Indoor+User+Movement+Prediction+from+RSS+data. Human Activity Recognition. The second dataset considered in our exper- imental analysis is related to the recognition of human activities from RSS data [37]. The goal is to recognize the action performed by the user within a set of 7 daily-life activities, i.e. bending with legs straight, bending with legs folded, cycling, lying, sitting, standing and walking. Sensor information is gath- ered from a small WSN composed by 3 IRIS nodes, worn by the user and placed on the chest, on the left ankle and on the right ankle. Input data is obtained from the time series of RSS information exchanged among the 3 sensors, where the average and the standard deviation of each of the 3 RSS traces was computed over a time slot of 250 milliseconds. The dataset thereby comprises 6-dimensional input sequences, with a sampling frequency of 4 Hz, corresponding to the RSS- based setting of the dataset presented in [37]. The corresponding learning task is modeled as a sequence-to-sequence multi-class classification task, in which the (ground-truth) target output at each time step is represented as a (+1/-1) 1-of-7 encoding of the activity that the user is correspondingly performing. In our experimental analysis, in order to obtain signals in a similar range of values for each input dimension, we re-scaled the RSS averages by a factor of 100 and the RSS standard deviations by a factor of 10. Examples of the re-scaled in- put signals in correspondence of some of the activities considered in the Human Activity Recognition dataset are illustrated in Figure 3, showing the high level of noise in the involved time-series and the difficulty of recognizing clear pat- terns by visual inspection. The Human Activity Recognition dataset comprises a (a) (b) 1 1 RSS1 Input Signal 0.5 RSS2 0.5 RSS1 RSS3 RSS2 0 RSS4 0 RSS3 -0.5 -0.5 RSS4 -1 -1 0 20 40 60 0 10 20 30 40 Time step (8 Hz) Time step (8 Hz) Fig. 2. Examples of RSS traces from the Indoor Movement Forecasting dataset. Input signals correspond to: (a) room change (target class +1); (b): room preservation (target class -1). total number of 88 sequences, and can be freely downloaded at the following ad- dress: https://archive.ics.uci.edu/ml/datasets/Activity+Recognition+ system+based+on+Multisensor+data+fusion+(AReM). (a) (b) 1 1 RSSav1 RSSav1 RSSstd1 RSSstd1 RSSav2 RSSav2 Input Signal RSSstd2 RSSstd2 RSSav3 RSSav3 RSSstd3 RSSstd3 0.5 0.5 0 0 0 20 40 60 80 100 120 0 20 40 60 80 100 120 (c) (d) 1 1 RSSav1 RSSav1 RSSstd1 RSSstd1 RSSav2 RSSav2 Input Signal RSSstd2 RSSstd2 RSSav3 RSSav3 RSSstd3 RSSstd3 0.5 0.5 0 0 0 20 40 60 80 100 120 0 20 40 60 80 100 120 (e) (f) 1 1 RSSav1 RSSav1 RSSstd1 RSSstd1 Input Signal RSSav2 RSSav2 RSSstd2 RSSstd2 RSSav3 RSSav3 0.5 RSSstd3 0.5 RSSstd3 0 0 0 20 40 60 80 100 120 0 20 40 60 80 100 120 Time step (4 Hz) Time step (4 Hz) Fig. 3. Examples of input traces (100 time steps-long excerpts) from the Human Activ- ity Recognition dataset. Input signals correspond to: (a): bending (with legs straight), (b): cycling, (c): lying, (d): sitting, (e): standing, (f ): walking. 3.2 Results In our experiments, we considered DeepESNs with NL = 10 layers, each of which consisted in a fully-connected reservoir with NR = 10 units, resulting in a total amount of 100 reservoir units. Note that, as pointed out also in [9], such a choice allowed us to focus our analysis on a reservoir size that practically allows the RC networks embedding into the nodes of a WSN. Moreover, we assumed that the same values of the spectral radius, inter-layer scaling and leaking rate parameters are adopted in each reservoir layer, i.e. for each i = (i) 1, 2, . . . , NL we set ρ(i) = ρ, sil = sil (for i > 1) and a(i) = a. As regards the readout training, we used ridge regression with regularization parameter denoted by λr . For each task, we performed a model selection procedure to chose the values of the RC hyper-parameters on a validation set, varying their values in the ranges reported in Table 1, and according to the cross-validation schemes described in the following. To this end, for each reservoir hyper-parameterization we independently generated 20 networks guesses (with different random seeds), and averaged the achieved results on such guesses. Hyper-parameter Range of values spectral radius ρ 0.7, 0.8, 0.9, 1 input scaling sin 0.1, 0.5, 1, 2, 5 inter-layer scaling sil 0.1, 0.5, 1, 2, 5 leaking rate a 0.1, 0.3, 0.5, 0.7, 1 readout regularization λr 10−9 , 10−8 , . . . , 10−2 , 10−1 Table 1. Range of RC hyper-parameters values considered for model selection. The predictive performance on the Indoor Movement Forecasting task was assessed in terms of (2 class) accuracy, while in the case of the Human Activity Recognition task we evaluated the models performance in terms of their 7-class accuracy, i.e. the rate of samples that are correctly assigned to their target class label among the 7 possibilities. As regards the model selection scheme, for the Indoor Movement Prediction task we considered a setting close to the heterogeneous case in [9]. Specifically, data pertaining to the first two couples of rooms were used as training set, while data corresponding to the third couple of rooms represented an external test set. Training data was then split into 5 folds, for a stratified nested 5- fold cross-validation scheme designed for the purpose of model selection (on the validation set). For the case of the Human Activity Recognition task, we adopted a stratified 3-fold cross-validation, with a further internal level of stratified 4- folds cross-validation used for model selection. In this regard, it is also worth noticing that the model selection schemes adopted in this paper represent an even more rigorous evaluation assessment procedure than those adopted in the reference works for both the AAL tasks considered (see [9] and [37]). For the sake of performance comparison, we ran experiments on the two AAL tasks also with standard shallow ESNs, following the same settings adopted for DeepESNs as described above. In particular, to the aim of comparison, it is important to stress that we used ESNs with the same number of total recur- rent units (and hence trainable parameters) as in the case of DeepESNs, with 100 units organized in a non-layered fully-connected reservoir architecture. This allowed us to effectively and directly assess (by comparison) the effect of the hi- model Training Validation Test shallow ESN 0.97(±0.00) 0.91(±0.03) 0.84(±0.04) DeepESN 0.98(±0.01) 0.95(±0.02) 0.90(±0.03) Table 2. Training, validation and test accuracy achieved by DeepESN and shallow ESN on the Indoor Movement task. model Training Validation Test shallow ESN 0.81(±0.01) 0.75(±0.02) 0.74(±0.01) DeepESN 0.86(±0.01) 0.77(±0.03) 0.77(±0.02) Table 3. Training, validation and test 7-class accuracy achieved by DeepESN and shallow ESN on the Human Activity Recognition task. erarchical organization of DeepESN reservoir state dynamics on the AAL tasks under a fair condition on the number of free parameters of the learner. The averaged predictive performance achieved by DeepESNs and shallow ESNs on the Indoor Movement Forecasting task is reported in Table 2 (averaged results and standard deviations are computed on the different reservoir guesses). Results show that, in presence of a similar performance on the training set, DeepESN achieves a higher accuracy than shallow ESN on both validation and test sets, reaching respectively 95% and 90% of accuracy. This suggests that, compared to standard shallow ESNs, DeepESNs on the one hand are able to achieve higher accuracy under homogeneous environmental conditions (in which the model is trained and assessed on data coming from the same set of envi- ronments) and on the other hand they can better generalize to environmental conditions completely unseen at the training stage. Averaged results obtained by DeepESNs and shallow ESNs on the Human Activity Recognition task are reported in Table 3 (averages and standard de- viations are computed on the different reservoir guesses). As it can be seen, DeepESNs outperform shallow ESNs, being able to better fit the training data, and at the same time reaching a higher performance on the validation and test sets, on both of which it is obtained a 7-class accuracy of 77%. The results obtained by DeepESNs on both the AAL tasks are relevant also in relation to those reported in literature on the same tasks. In particular, as regards the Indoor Movement Prediction task, the test performance of Deep- ESN reported in this paper is higher than those reported in [9] in the closest experimental setting6 , even with respect to 5 times larger RC networks. Fur- thermore, the 7-class accuracy on the test set reported here for DeepESN on the Human Activity Recognition task is also comparable with the performance reported in [37] under the closest experimental setting7 , although the latter has been achieved with possibly much larger RC networks and in correspondence 6 Non-local 4 setting of the heterogeneous task, as reported in [9]. 7 RSS-based setting of the activity recognition system, as reported in [37]. of a less thorough scheme for performance assessment than the one considered here8 . Overall, the results of the experimental analysis provided in this section clearly point out the convenience of a hierarchical organization of the recur- rent dynamical part of the neural network architecture in learning tasks from streams of sensorial data within the AAL domain. From a practical perspective, our results showed that given a recurrent network architecture with a limited size of 100 units (which realistically allows a direct embedding into a node of a WSN, as discussed also in [9]), and given the training efficiency characterization common of the RC framework, the layered DeepESN organization is preferable to a standard shallow one. 4 Conclusions In this paper we have proposed an experimental investigation aimed at assess- ing the introduction of the DeepESN methodology for applications in the area of AAL from temporal data originated by a network of sensors. To this aim we have conducted experiments on two real-world benchmark datasets related to the identification of human indoor behavior from temporal streams of RSS informa- tion. On both the considered tasks DeepESNs were able to overcome the results achieved by standard ESNs under the same experimental settings and number of trainable parameters. Limiting our analysis to a total number of reservoir units that represent a suitable case for embedding into small devices, DeepESNs led to increase of classification accuracy, with respect to shallow ESNs, of 6% on the Indoor Movement Forecasting (binary classification) task, and of 3% (in terms of 7-class accuracy) on the Human Activity Recognition (multi-classification) task. The analysis proposed in this paper pointed out the performance advantage brought about by a layered RNN architecture in dealing with AAL tasks: given the same amount of recurrent units it is indeed a good idea to stack them into a layered network when dealing with tasks in this domain. Moreover, the very good result achieved by DeepESNs in the analyzed cases also provide an interesting insight on the appropriateness of the structured feature representations devel- oped by hierarchical reservoirs when excited by temporal data gathered from sensors. This would in turn indicate the relevance, at least partial, of a mul- tiple time-scales nature of the temporal data involved in this application field. Overall, the experimental assessment presented in this paper indicates that the DeepESN approach, yet inheriting the desirable RC characterization of training efficiency, is able to further enhance the already good predictive ability of RC networks, and it is therefore put forward as an effective methodology for learning in temporal domains for future real-world AAL applications. 8 Results reported in [37] have been obtained with reservoirs up to 500 units and considering an hold-out cross-validation scheme, such that the test set performance therein reported refers to a smaller set of samples than the one considered in this paper. References 1. Amato, G., Bacciu, D., Broxvall, M., Chessa, S., Coleman, S., Di Rocco, M., Drag- one, M., Gallicchio, C., Gennaro, C., Lozano, H., McGinnity, T., Micheli, A., Ray, A., Renteira, A., Saffiotti, A., Swords, D., Vairo, C., Vance, P.: Robotic ubiquitous cognitive ecology for smart homes. Journal of Intelligent & Robotic Systems 80, 57 (2015) 2. Amato, G., Bacciu, D., Chessa, S., Dragone, M., Gallicchio, C., Gennaro, C., Lozano, H., Micheli, A., MP, G.O.G., Renteria, A., Vairo, C.: A benchmark dataset for human activity recognition and ambient assisted living. In: Ambient Intelligence-Software and Applications–7th International Symposium on Ambient Intelligence (ISAmI 2016). pp. 1–9. Springer (2016) 3. Bacciu, D., Chessa, S., Ferro, E., Fortunati, L., Gallicchio, C., La Rosa, D., Llorente, M., Micheli, A., Palumbo, F., Parodi, O., Valenti, A., Vozzi, F.: De- tecting socialization events in ageing people: The experience of the doremi project. In: 12th International Conference on Intelligent Environments (IE). pp. 132–135. IEEE (2016) 4. Bacciu, D., Chessa, S., Gallicchio, C., Micheli, A.: On the need of machine learning as a service for the internet of things. In: Accepted for the International Conference on Internet of Things and Machine Learning (IML) (2017) 5. Bacciu, D., Chessa, S., Gallicchio, C., Micheli, A., Barsocchi, P.: An experimental evaluation of reservoir computation for ambient assisted living. In: Neural Nets and Surroundings, Smart Innovation, Systems and Technologies, vol. 19, pp. 41– 50. Springer (2013) 6. Bacciu, D., Chessa, S., Gallicchio, C., Micheli, A., Ferro, E., Fortunati, L., Palumbo, F., Parodi, O., Vozzi, F., Hanke, S., Kropf, J., Kreiner, K.: Smart envi- ronments and context-awareness for lifestyle management in a healthy active ageing framework. In: Portuguese Conference on Artificial Intelligence (EPIA). pp. 54–66. Springer (2015) 7. Bacciu, D., Chessa, S., Gallicchio, C., Micheli, A., Pedrelli, L., Ferro, E., Fortunati, L., La Rosa, D., Palumbo, F., Vozzi, F., Parodi, O.: A learning system for auto- matic berg balance scale score estimation. Engineering Applications of Artificial Intelligence 66, 60 – 74 (2017) 8. Bacciu, D., Gallicchio, C., Micheli, A., Di Rocco, M., Saffiotti, A.: Learning context- aware mobile robot navigation in home environments. In: The 5th International Conference on Information, Intelligence, Systems and Applications, IISA 2014. pp. 57–62. IEEE (2014) 9. Bacciu, D., Barsocchi, P., Chessa, S., Gallicchio, C., Micheli, A.: An experimental characterization of reservoir computing in ambient assisted living applications. Neural Computing and Applications 24(6), 1451–1464 (2014) 10. Baronti, P., Pillai, P., Chook, V.W., Chessa, S., Gotta, A., Hu, Y.F.: Wireless sensor networks: A survey on the state of the art and the 802.15. 4 and zigbee standards. Computer communications 30(7), 1655–1695 (2007) 11. Barsocchi, P., Chessa, S., Micheli, A., Gallicchio, C.: Forecast-driven enhancement of received signal strength (rss)-based localization systems. ISPRS International Journal of Geo-Information 2(4), 978–995 (2013) 12. Chessa, S., Gallicchio, C., Guzman, R., Micheli, A.: Robot localization by echo state networks using rss. In: Recent Advances of Neural Network Models and Applica- tions, Smart Innovation, Systems and Technologies, vol. 26, pp. 147–154. Springer International Publishing (2014) 13. Dragone, M., Amato, G., Bacciu, D., Chessa, S., Coleman, S., Rocco, M.D., Gallic- chio, C., Gennaro, C., Lozano, H., Maguire, L., McGinnity, M., Micheli, A., O’Hare, G., Renteria, A., Saffiotti, A., Vairo, C., Vance, P.: A cognitive robotic ecology ap- proach to self-configuring and evolving AAL systems. Engineering Applications of Artificial Intelligence 45, 269–280 (2015) 14. Dragone, M., Gallicchio, C., Guzman, R., Micheli, A.: Deep reservoir computing: A critical analysis. In: Proceedings of the 24th European Symposium on Artificial Neural Networks (ESANN). pp. 71–76. i6doc.com (2016) 15. Gallicchio, C.: Short-term memory of deep rnn. In: Proceedings of the 26th Euro- pean Symposium on Artificial Neural Networks (ESANN) (2018) 16. Gallicchio, C., Martin-Guerrero, J., Micheli, A., Soria-Olivas, E.: Randomized ma- chine learning approaches: Recent developments and challenges. In: Proceedings of the 25th European Symposium on Artificial Neural Networks (ESANN). pp. 77–86. i6doc.com (2017) 17. Gallicchio, C., Micheli, A.: Architectural and markovian factors of echo state net- works. Neural Networks 24(5), 440–456 (2011) 18. Gallicchio, C., Micheli, A.: Deep reservoir computing: A critical analysis. In: Pro- ceedings of the 24th European Symposium on Artificial Neural Networks (ESANN). pp. 497–502. i6doc.com (2016) 19. Gallicchio, C., Micheli, A.: Deep echo state network (DeepESN): A brief survey. arXiv preprint arXiv:1712.04323 (2017) 20. Gallicchio, C., Micheli, A.: A reservoir computing approach for human gesture recognition from kinect data. In: Proceedings of the Workshop Artificial Intelli- gence for Ambient Assisted Living (AI*AAL 2016), co-located with the 15th In- ternational Conference of the Italian Association for Artificial Intelligence (AI*IA 2016). vol. 1803, pp. 33–42. CEUR Workshop Proceedings (2017) 21. Gallicchio, C., Micheli, A., Barsocchi, P., Chessa, S.: User movements forecast- ing by reservoir computing using signal streams produced by mote-class sensors. In: Mobile Lightweight Wireless Systems (Mobilight 2011), Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications En- gineering, vol. 81, pp. 151–168. Springer Berlin Heidelberg (2012) 22. Gallicchio, C., Micheli, A., L.Pedrelli: Deep echo state networks for diagnosis of parkinson’s disease. In: Proceedings of the 26th European Symposium on Artificial Neural Networks (ESANN) (2018) 23. Gallicchio, C., Micheli, A., Pedrelli, L., Fortunati, L., Vozzi, F., Parodi, O.: A Reservoir Computing Approach for Balance Assessment, Lecture Notes in Com- puter Science, vol. 9785, pp. 65–77. Springer International Publishing (2016) 24. Gallicchio, C., Micheli, A., Silvestri, L.: Local lyapunov exponents of deep echo state networks. Neurocomputing p. (Accepted) (2017) 25. Gallicchio, C., Micheli, A., Silvestri, L.: Local Lyapunov Exponents of Deep RNN. In: Proceedings of the 25th European Symposium on Artificial Neural Networks (ESANN). pp. 559–564. i6doc.com (2017) 26. Gallicchio, C., Micheli, A.: Echo state property of deep reservoir computing net- works. Cognitive Computation 9, 337–350 (2017) 27. Gallicchio, C., Micheli, A., Pedrelli, L.: Deep reservoir computing: a critical exper- imental analysis. Neurocomputing 268, 87–99 (2017) 28. Gallicchio, C., Micheli, A., Pedrelli, L.: Hierarchical temporal representation in linear reservoir computing. In: Proceedings of the 27th Italian Workshop on Neural Networks (WIRN) (2017), arXiv preprint arXiv:1705.05782 29. Haykin, S.: Neural networks and learning machines, vol. 3. Pearson (2009) 30. Jaeger, H.: The ”echo state” approach to analysing and training recurrent neural networks - with an erratum note. Tech. rep., GMD - German National Research Institute for Computer Science, Tech. Rep. (2001) 31. Jaeger, H., Haas, H.: Harnessing nonlinearity: Predicting chaotic systems and sav- ing energy in wireless communication. Science 304(5667), 78–80 (2004) 32. Jaeger, H., Lukoševičius, M., Popovici, D., Siewert, U.: Optimization and applica- tions of echo state networks with leaky-integrator neurons. Neural Networks 20(3), 335–352 (2007) 33. Kolen, J., Kremer, S.: A field guide to dynamical recurrent networks. John Wiley & Sons (2001) 34. Lara, O.D., Labrador, M.A.: A survey on human activity recognition using wear- able sensors. IEEE Communications Surveys and Tutorials 15(3), 1192–1209 (2013) 35. Lukoševičius, M., Jaeger, H.: Reservoir computing approaches to recurrent neural network training. Computer Science Review 3(3), 127–149 (2009) 36. Lukoševičius, M., Jaeger, H., Schrauwen, B.: Reservoir computing trends. KI- Künstliche Intelligenz 26(4), 365–371 (2012) 37. Palumbo, F., Gallicchio, C., Pucci, R., Micheli, A.: Human activity recognition using multisensor data fusion based on reservoir computing. Journal of Ambient Intelligence and Smart Environments 8(2), 87–107 (2016) 38. Palumbo, F., La Rosa, D., Ferro, E., Bacciu, D., Gallicchio, C., Micheli, A., Chessa, S., Vozzi, F., Parodi, O.: Reliability and human factors in ambient assisted living environments. Journal of Reliable Intelligent Environments pp. 1–19 (2017) 39. Vermesan, O., Bröring, A., Tragos, E., Serrano, M., Bacciu, D., Chessa, S., Gal- licchio, C., Micheli, A., Dragone, M., Saffiotti, A., Simoens, P., Cavallo, F., Bahr, R.: Internet of robotic things: converging sensing/actuating, hypoconnectivity, ar- tificial intelligence and iot platforms. In: Cognitive hyperconnected digital trans- formation: internet of things intelligence evolution, pp. 1–35 (2017) 40. Verstraeten, D., Schrauwen, B., d’Haene, M., Stroobandt, D.: An experimental unification of reservoir computing methods. Neural networks 20(3), 391–403 (2007) 41. Yildiz, I.B., Jaeger, H., Kiebel, S.J.: Re-visiting the echo state property. Neural networks 35, 1–9 (2012)