=Paper=
{{Paper
|id=Vol-3741/paper30
|storemode=property
|title=Nowcasting of the energy production of wind power plants through spatially-aware model trees
|pdfUrl=https://ceur-ws.org/Vol-3741/paper30.pdf
|volume=Vol-3741
|authors=Annunziata D’Aversa,Gianvito Pio
|dblpUrl=https://dblp.org/rec/conf/sebd/DAversaP24
}}
==Nowcasting of the energy production of wind power plants through spatially-aware model trees==
Nowcasting of the energy production of wind power plants through spatially-aware model trees (Discussion Paper) Annunziata D’Aversa1,2,* , Gianvito Pio1,2 1 Dept. of Computer Science, University of Bari "Aldo Moro", Via E. Orabona, 4, 70125 Bari, Italy 2 Data Science Lab, National Interuniversity Consortium for Informatics (CINI), Via Volturno, 58, 00185 Roma, Italy Abstract The accurate prediction of the energy production from renewable power plants in short-term intervals is of paramount importance in smart grids, to ensure an efficient distribution of energy within the network. Existing predictive approaches are mainly based on autoregressive models, machine learning methods and, more recently, on neural network architectures that also exploit spatio-temporal information. However, most of them are not able to capture spatial information at different degrees of locality, and tend to impose the presence of linear (or non-linear) dependencies among data. In this paper, we discuss a novel approach that is based on linear model trees, to simultaneously model linear and non-linear dependencies, properly extended to capture the spatial dimension at different degrees of locality. The proposed approach is able to work in the multi-step predictive setting, that means that it can simultaneously provide predictions for multiple time intervals in the future. Our experiments on a real dataset about the energy produced by wind power plants demonstrate the effectiveness of our method also in comparison with state-of-the-art neural network architectures. Keywords Time series nowcasting, Spatio-temporal autocorrelation, Multi-step prediction 1. Introduction Smart grids are networks that distribute electricity with the support of sensors, advanced communication technologies, and predictive components. Within the latter, models able to forecast the energy consumption and production play a fundamental role. Indeed, in long- term scenarios, they can support planning interventions on the network, aiming not only to decrease production costs but also to contribute to the reduction of greenhouse gas emissions. On the other hand, in short-term scenarios, the forecasting (usually called nowcasting, in the case of very short-term timeframes) of energy production and consumption can be useful for performing real-time load balancing actions, that may include powering on backup plants or drawing energy from customers’ accumulators. In general, predictive models can be built by relying on machine learning methods by exploit- ing historical data and spatial information of nodes. Indeed, the spatial dimension may introduce SEBD 2024: 32nd Symposium on Advanced Database Systems, June 23-26, 2024, Villasimius, Sardinia, Italy * Corresponding author. $ annunziata.daversa@uniba.it (A. D’Aversa); gianvito.pio@uniba.it (G. Pio) 0000-0003-1791-5998 (A. D’Aversa); 0000-0003-2520-3616 (G. Pio) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings spatial autocorrelation phenomena, which refer to dependencies that may exist among obser- vations at nearby geographical locations. In this context, the spatial proximity among power plants or among customers can influence measurements due to similar climatic conditions. Another important aspect is that real-world time-series coming from sensor measurements often exhibit a combination of linear and non-linear trends. This is very common when mea- surements depend on weather conditions, which may easily show non-linear phenomena, e.g., possibly due to storms or other extreme events. Non-linear phenomena may also emerge in the case of power grid failures. Therefore, capturing both linear and non-linear trends and relation- ships, along with the exploitation of historical data and spatial information, could improve the model performance and lead to more accurate predictions. In the literature, several nowcasting approaches have been proposed leveraging on autore- gressive models[1, 2], machine learning models [3, 4] and hybrid models [5]. However, in the literature, we can find only a few works that also take into account the spatial dimension [6, 7, 8, 9]. For instance, in [6] the authors propose a method for 5-minute ahead wind power forecasting. The authors capture spatio-temporal dependencies using a method based on sparse parametrization of VAR models, which selects coefficients that link sites with a spatial co- dependence, discarding those exhibiting weak dependencies. Another relevant example is [8], where the authors proposed a spatio-temporal graph convolution neural network for the short- term prediction of the energy produced by wind power plants. The authors consider a multi-step setting, where 16 future values (at a 15-minutes interval) are predicted simultaneously. The contribution of the temporal and spatial dimensions has also been considered in the context of the more classical forecasting scenarios to predict the hourly energy production of photovoltaic power plants 24 hours ahead [10], or to predict the monthly energy consumption of customers one year ahead [11]. Also these works consider a multi-step setting, where the 24 hourly predictions (in [10]) and the 12 monthly predictions (in [11]) are returned simultaneously by the model, possibly exploiting dependencies among them. The spatial dimension is considered by resorting to two well known techniques in spatial statistics: the Local Indicator of Spatial Association (LISA), that represents a local measure of spatial autocorrelation [12], and the Principal Coordinates of Neighbour Matrices (PCNM), that represent the spatial structure in the data [13]. Such indicators are used to augment the feature space of training instances. Recently, several neural network architectures that consider both temporal and spatial dimen- sions have been proposed, but they were applied in different application domains. A relevant example is MTGNN [14], that is a graph convolutional network applied to multiple domains, in- cluding energy and traffic speed forecasting. MTGNN employs multiple temporal convolutional networks (TCNs) with various kernel sizes, for learning temporal dependencies at different scales, and a self-adaptive adjacency matrix to capture spatial correlations. It is noteworthy that, although some of the mentioned approaches are able to represent and exploit the spatial information, they cannot capture spatial dependencies at different degrees of locality. A first attempt to capture local spatial information can be found in [15], where the authors proposed the method D2 STGNN applied to the traffic speed forecasting. D2 STGNN identifies both diffusion signals, representing how traffic conditions spread through the network, and inherent patterns, such as recurring traffic patterns or daily / seasonal variations. The model adopts a spatio-temporal localized convolution to capture hidden diffusion time series, while a combination of GRU (for short-term dependencies) and multi-head self-attention mechanism Figure 1: An example showing the difference between a classical regression tree and a linear model tree. (for long-term dependencies) is employed to model hidden inherent time series. In this paper, we discuss an approach to solve nowcasting tasks in the context of the prediction of the energy produced by wind power plants, in a multi-step setting. Specifically, we aim at learning a nowcasting model capable of predicting the energy production for 12 time-steps, at a 15-minutes granularity. Methodologically, contrary to most existing approaches, we capture both linear and non-linear phenomena through linear model trees. Moreover, we extend them to effectively capture and model the spatial information at different levels of locality. 2. Spatially-aware linear model trees As introduced in Section 1, we aim at adopting an approach that is able to capture both linear and non-linear dependencies. In this respect, we argue that linear model trees [16] can represent a possible solution, since they combine the ability to model non-linear dependencies of regression trees with that of linear models. Existing methods for the construction of model trees employ a learning process characterized by a top-down induction procedure that recursively partitions the training set, which is analogous to that adopted by conventional tree-based algorithms. In linear model trees, in the leaf nodes we find linear models instead of constant approxima- tions of classical regression trees. More formally, given a set of independent variables and a dependent variable 𝑦, a standard regression tree returns, for each leaf node 𝑘, a constant value 𝑐𝑘 , namely, 𝑦 = 𝑐𝑘 for all the instances falling in the leaf node 𝑘. Such constant value is usually an aggregation (mean, median, etc.) of the value 𝑦 of the training instances falling in the leaf node. On the other hand, in model trees, each leaf node of the tree contains a linear regression model that predicts the target variable based on the data points that reach that leaf. An example to illustrate the difference between a regression tree and a linear model tree is shown in Fig. 1. The quality of a split is usually measured using a criterion that quantifies how well the split separates data with respect to the target variable. For example, in CART [17], the quality of a split is evaluated by the Mean Squared Error (MSE). When a node is split, the MSE is computed for each resulting child node, and the weighted sum (according to the number of instances) of these MSE values represents the quality of the split. The best split is defined as the one that minimizes the MSE. In the case of linear model trees, the behavior is similar: the only difference is that the MSE on the child nodes is computed after fitting a linear model on them. In our approach, we considered the multi-step (MS) setting proposed in [11] that consists in predicting multiple future values of the target variable simultaneously. In particular, our approach falls in the Multi-Input Multi-Output (MIMO) category [18], which goal is learning a global predictive model that returns the whole vector of predictions, also taking into account the possible dependencies between future values, that in principle may be beneficial in terms of forecasting accuracy. More formally, we consider as input features 𝑤 historical values of the target variable 𝑦𝑡−𝑤 , 𝑦𝑡−𝑤+1 , ..., 𝑦𝑡−1 , in order to predict the value of the target variable for ℎ future timesteps 𝑦𝑡 , 𝑦𝑡+1 ..., 𝑦𝑡+ℎ , simultaneously. Note that, in this case, the reduction of the MSE of a split is evaluated as the average reduction of MSE over all the future timesteps. In the literature, we can find several implementations of linear model trees [16, 19, 20]. In this work, we consider the simplest implementation, where internal nodes are simple tests involving descriptive variables, while leaf nodes are linear models, as shown in the right part of Fig. 1. This choice makes our extension towards the consideration of the spatial dimension more straightforward. Specifically, as introduced in Section 1, we aim at extending linear model trees to effectively capture and model the spatial dimensions at different levels of locality. Methodologically, we introduce the consideration of the spatial dimension as a post-processing step of the tree construction: we aim at capturing spatial relationships within each subset implicitly defined by a leaf node of the model tree, potentially capturing spatial relationships at different levels of locality. Considering to have multiple different positions (e.g., production plants or consumers), each represented through several 𝑤-dimensional training instances, we act as follows: for each instance 𝑥𝛼,𝑡 fallen into a leaf node 𝑙, related to the time step 𝑡 and to the geographic position 𝛼, we compute a set of additional features 𝑆𝛼,𝑡,𝑙 . These features are computed as the weighted average of the 𝑤-dimensional historical observations at the same time step 𝑡 from other positions1 in 𝑙, where the weights are determined by the spatial closeness between 𝛼 and the other positions (see Figure 2). More formally, 𝑆𝛼,𝑡,𝑙 is defined as follows: 1 ∑︁ 𝑆𝛼,𝑡,𝑙 = ∑︀ · 𝐶[𝛼, 𝛽] · 𝑥𝛽,𝑡 (1) 𝛽∈𝑃𝑙 ,𝛽̸=𝛼 𝐶[𝛼, 𝛽] 𝛽∈𝑃𝑙 ,𝛽̸=𝛼 where 𝑃𝑙 is the set of distinct positions of the training instances fallen into the leaf node 𝑙; 𝑥𝛽,𝑡 is the vector of 𝑤 historical observations of the location 𝛽 at the time step 𝑡; 𝐶[𝛼, 𝛽] is the spatial closeness between the positions 𝛼 and 𝛽 computed as follows: 𝐷[𝛼, 𝛽] 𝐶[𝛼, 𝛽] = 1 − (2) 𝑚𝑎𝑥(𝐷) where 𝐷 is the distance matrix among locations computed according to the geodesic distance. The additional features are computed and added to all the training instances falling into the leaf node. Finally, a new linear model is trained and the contribution of the added features is assessed using a validation set. Therefore, we compare two distinct linear models, as depicted in Figure 3. The first model is exclusively trained on the original features (during the construction of the tree), while the second model incorporates both the original features and the additional ones computed according to the spatial closeness. We selectively retain the model that demonstrates 1 Note that, if a given leaf node contains training instances associated with only one position, this step is skipped. Figure 2: An example of computation of additional features 𝑆𝛼,𝑡,𝑙 for the instance 𝑥𝛼,𝑡 fallen in the leaf node 𝑙, given the presence of other instances in 𝑙 belonging to the geographic positions 𝛽, 𝛾, and 𝜆. Figure 3: Comparison of the predictive performance of two linear models on the training instances of a leaf node: the first model is learned from the original features, while the second is learned from the original features expanded with the features computed according to the spatial closeness. the lowest validation error within each leaf node. This selection process ensures that we tailor our modeling approach to the specific peculiarities of each subset of data falling into leaf nodes. Consequently, within this tree, some leaf nodes may employ models that incorporate spatial features, while others may rely only on the original features (i.e., when the additional features based on spatial closeness appear to provide no advantage). After performing this process on all the leaf nodes, we apply a pruning step to prevent overfitting and possibly capture more global (i.e., less local) spatial dependencies. In particular, we propose an extended version of the Reduced Error Pruning (REP) algorithm [21]: starting from the bottom of the tree and working backward, for each internal node, it compares the error made by the unpruned tree with that made simulating that the subtree rooted on the node is pruned. The subtree is actually pruned only if the resulting tree performs no worse than the unpruned one over the validation set. In our extended version, we also consider the possible contribution coming from the features based on the spatial closeness. In particular, we compare the unpruned tree with the pruned tree and with the pruned tree that also considers the features based on the spatial closeness. Considering the example reported in Figure 4, given Figure 4: Extended version of the Reduced Error Pruning strategy that also takes into account the contribution of additional features that consider the spatial closeness. the internal node 𝑛4 , we compare the errors made on the validation set by three models: i) the model represented by its two children nodes 𝑙4 and 𝑙5 (see the left part of Figure 4); ii) the model obtained after pruning the subtree rooted in 𝑛4 and learning a new linear model from the instances falling into it (see the middle part of Figure 4); iii) the model obtained after pruning the subtree rooted in 𝑛4 and learning a new linear model from the instances falling into it, expanded with the features considering the spatial closeness (see the right part of Figure 4). If the model ii) or the model iii) leads to an improvement on the validation set, the tree is pruned accordingly. This process continues in a bottom-up fashion until no improvement is obtained. 3. Experiments In order to assess the effectiveness of the proposed approach, we performed our experiments on a real-world wind power plants dataset, provided by a lead company in the energy distribution field. The dataset consists of measurements of the energy production of 60 wind plants, collected every 15 minutes for a period of 1 year. Together with the geographic position (latitude and longitude), the plants are described by some technical characteristics, namely, avg_wind_turbine_height, rotor_diameter, and number_of_wind_turbines. Following a cross-validation setting for time series, we consider a sliding window approach where the training set consists of 4 months of data, the validation set corresponds to the last month of the training set, and the test set is the subsequent month. We performed the experiments considering a multi-step setting, where the goal is to predict the energy production of 12 target time-steps ahead simultaneously. As historical measurements associated with each instance, we consider 12 previous values of energy production, i.e., 𝑤 = 12. It is noteworthy that, in real-world production scenarios, actual measurements are often made available after a certain amount of time. Therefore, we evaluated the performance of all the models considering different delays from the last observed measurement and the first target time-step to predict. The considered delays are 0 hours, 2 hours and 4 hours. To learn the initial model tree, we considered the implementation available in the linear-tree python library2 . For all the experiments, we investigated two different configurations of its parameters, namely: min_samples_leaf = 0.1, max_depth = 5 and min_samples_leaf = 0.05, max_depth = 20. The original version of this system (henceforth denoted with LT), that ignores the spatial information, has been considered as the closest competitor to our approach. As additional competitor systems, we considered three different regressors that are able to work in the multi-step setting, namely, Linear Regression (henceforth denoted with LR), Random Forests (henceforth denoted with RF) and XGBoost Regressor (henceforth denoted with XGB). For all these competitors, we also assessed the performance achieved when the spatial information is considered by injecting PCNM variables [13]. This allows us to specifically evaluate the contribution of the novel strategy that we proposed to model the spatial dimension. Finally, we considered two state-of-the-art neural network architectures that can work in the multi-step setting and capture spatio-temporal phenomena, i.e., MTGNN [14] and D2 STGNN [15]. As evaluation measure, we collected the Relative Squared Error (RSE) for LT, and the per- centage of improvement with respect to the best configuration of such a model for the pro- posed method and for all the considered competitor systems. The RSE is formally defined as 𝑦 )2 ∑︀ (𝑦 −̃︀ 𝑅𝑆𝐸 = ∑︀ (𝑦𝑡 −𝑦)𝑡 2 , where 𝑦𝑡 and 𝑦̃︀𝑡 are the true and the predicted values, respectively, for the 𝑡 𝑡 𝑡 𝑡-th time-step, while 𝑦 is the average value of a given target time-step in the training set. The adoption of the RSE, instead of more commonly adopted measures like MAE/MSE/RMSE, allows us to evaluate the actual usefulness of the predictive models in real scenarios, with respect to adopting a baseline predictor that always returns the mean of the measurements: an RSE value close to 0.0 means that the model returns perfect predictions; an RSE value close to 1.0 corresponds to a model that performs analogously to the baseline that always returns the mean; an RSE value higher than 1.0 means that the model performs worse than such a baseline. In Table 1, we report the RSE results, averaged over all the target time-steps and over all the folds of the cross-validation. As expected, all the considered methods perform worse with higher delays. Nevertheless, all the RSE values remains under 1.0, which means that they can still provide more useful indications than those provided by the baseline predictor based on the average. Looking at the results obtained by our approach, it clearly provides advantages over LT with all the values of delay and in both configurations of its parameters. On the contrary, all the other competitors perform worse than (or equal to) LT, except for few specific cases, where the improvement is no more than 0.6%. These results confirm the adequacy of adopting model trees in this application domain, due to the co-presence of linear and non-linear phenomena. Looking at the contribution provided by the PCNM variables to the competitors, we can observe no evident differences with respect to the same methods with no PCNM features, with some peculiar cases in which the error also increases (see, for example, RF+PCNM vs RF). This is possibly due to the fact that PCNM variables do not take historical factors into account. On the other hand, our approach incorporates additional historical features, taking into account the spatial closeness at different degrees of locality. This approach clearly performs better than injecting static features dependent on the positions as seen by the approaches relying on PCNM. In general, we can observe that our approach outperforms all the considered competitors, including those based on recent neural network architectures. Surprisingly, they obtained the 2 https://github.com/cerlymarco/linear-tree Model min_samples_leaf max_depth 0 hours delay 2 hours delay 4 hours delay LT 0.1 5 0.261 0.499 0.645 RSE LT 0.05 20 0.260 0.498 0.644 LT+PCNM 0.1 5 0.00% 0.60% 0.30% % of Improvement LT+PCNM 0.05 20 -2.70% -9.00% -4.00% RF default default -3.40% -3.60% -4.70% RF+PCNM default default -3.40% -4.00% -4.80% XGB default default 0.00% 0.20% 0.00% XGB+PCNM default default -0.40% 0.20% 0.00% LR - - -0.80% -0.20% 0.00% LR+PCNM - - -0.80% -0.20% 0.20% MTGNN - - -181.89% -63.73% -35.19% D2 STGNN - - -123.92% -60.26% -30.96% Our approach 0.1 5 6.10% 4.40% 2.30% Our approach 0.05 20 6.50% 5.20% 2.80% Table 1 Average RSE obtained by LT and percentage of improvement with respect to such a method (with min_samples_leaf = 0.1 and max_depth = 5), obtained by our approach and by the competitor systems. Positive and negative percentages are emphasized with green and orange backgrounds, respectively. worst results among the considered systems. This is possibly due to the complexity of their architecture that requires a huge amount of training data (possibly much higher than those available in this context) to properly learn an accurate model. 4. Conclusion In this paper, we presented an approach for nowcasting the energy produced by wind power plants in a multi-step predictive setting. We enabled linear model trees to capture spatial phenomena at different degrees of locality. Specifically, we incorporate additional features that represent historical observations of other plants, taking into account their spatial closeness. Moreover, we also extended the REP pruning strategy to consider the spatial dimension. Our experiments, performed on a real-world dataset, proved the effectiveness of the proposed approach, in comparison with standard linear trees and other state-of-the-art competitors that are also able to model the spatial dimension. For future work, we intend to evaluate the effectiveness of the proposed method in other domains, and to perform a deep evaluation of the difference in terms of (theoretical and empirical) model complexity with respect to unpruned linear trees and complex neural networks. Acknowledgments This work was partially supported by the project FAIR - Future AI Research (PE00000013), Spoke 6 - Symbiotic AI, under the NRRP MUR program funded by the NextGenerationEU. The research of Annunziata D’Aversa is funded by a PhD fellowship within the framework of the Italian "POR Puglia FSE 2014-2020" – Axis X - Action 10.4 "Interventions to promote research and for university education - PhD Project n. 1004.121 (CUP n. H99J21006620008). References [1] Aasim, S. Singh, A. Mohapatra, Repeated wavelet transform based arima model for very short-term wind speed forecasting, Renewable Energy 136 (2019) 758–768. [2] P. Bacher, H. Madsen, H. A. Nielsen, Online short-term solar power forecasting, Solar Energy 83 (2009) 1772–1783. [3] Q. Hu, S. Zhang, M. Yu, Z. Xie, Short-term wind speed or power forecasting with het- eroscedastic support vector regression, IEEE Transactions on Sustainable Energy 7 (2016) 241–249. [4] C. Li, G. Tang, X. Xue, A. Saeed, X. Hu, Short-term wind speed interval prediction based on ensemble gru model, IEEE Transactions on Sustainable Energy 11 (2020) 1370–1380. [5] P. Jiang, Y. Wang, J. Wang, Short-term wind speed forecasting using a hybrid model, Energy 119 (2017) 561–577. [6] J. Dowell, P. Pinson, Very-short-term probabilistic wind power forecasts by sparse vector autoregression, IEEE Transactions on Smart Grid 7 (2015) 763–770. [7] X. G. Agoua, R. Girard, G. Kariniotakis, Short-term spatio-temporal forecasting of photo- voltaic power production, IEEE Transactions on Sustainable Energy 9 (2018) 538–546. [8] Z. Li, L. Ye, Y. Zhao, M. Pei, P. Lu, Y. Li, B. Dai, A spatiotemporal directed graph convolution network for ultra-short-term wind power prediction, IEEE Transactions on Sustainable Energy 14 (2023) 39–54. [9] M. Khodayar, J. Wang, Spatio-temporal graph deep neural network for short-term wind speed forecasting, IEEE Transactions on Sustainable Energy 10 (2019) 670–681. [10] M. Ceci, R. Corizzo, F. Fumarola, D. Malerba, A. Rashkovska, Predictive modeling of pv energy production: How to set up the learning task for a better prediction?, IEEE Transactions on Industrial Informatics 13 (2017) 956–966. [11] A. D’Aversa, S. Polimena, G. Pio, M. Ceci, Leveraging spatio-temporal autocorrelation to improve the forecasting of the energy consumption in smart grids, in: P. Pascal, D. Ienco (Eds.), Discovery Science, Springer Nature Switzerland, Cham, 2022, pp. 141–156. [12] L. Anselin, Local indicators of spatial association — LISA, Geographical analysis 27 (1995) 93–115. [13] S. Dray, P. Legendre, P. R. Peres-Neto, Spatial modelling: a comprehensive framework for principal coordinate analysis of neighbour matrices (PCNM), Ecological modelling 196 (2006) 483–493. [14] Z. Wu, S. Pan, G. Long, J. Jiang, X. Chang, C. Zhang, Connecting the dots: Multivariate time series forecasting with graph neural networks, in: Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, 2020, pp. 753–763. [15] Z. Shao, Z. Zhang, W. Wei, F. Wang, Y. Xu, X. Cao, C. Jensen, Decoupled dynamic spatial- temporal graph neural network for traffic forecasting., volume 15, VLDB Endowment, 2022, pp. 2733–2746. [16] J. R. Quinlan, et al., Learning with continuous classes, in: 5th Australian joint conference on artificial intelligence, volume 92, World Scientific, 1992, pp. 343–348. [17] L. Breiman, J. Friedman, C. Stone, R. Olshen, Classification and Regression Trees, Routledge, 2017. [18] S. B. Taieb, G. Bontempi, A. F. Atiya, A. Sorjamaa, A review and comparison of strategies for multi-step ahead time series forecasting based on the NN5 forecasting competition, Expert systems with applications 39 (2012) 7067–7083. [19] Y. Wang, I. Witten, Induction of model trees for predicting continuous classes, Induction of Model Trees for Predicting Continuous Classes (1997). [20] D. Malerba, F. Esposito, M. Ceci, A. Appice, Top-down induction of model trees with regression and splitting nodes, IEEE Transactions on Pattern Analysis and Machine Intelligence 26 (2004) 612–625. doi:10.1109/TPAMI.2004.1273937. [21] J. Quinlan, Simplifying decision trees, International Journal of Man-Machine Studies 27 (1987) 221–234.