1. Introduction

Corresponding author. † These authors contributed equally. serhii.popov@nure.ua (S. Popov); iryna.pliss@nure.ua (I. Pliss); yevgeniy.bodyanskiy@nure.ua (Y. Bodyanskiy)

Multidimensional cascade bagging metamodel and its online learning⋆

Sergiy Popov

Iryna Pliss

Yevgeniy Bodyanskiy

0 0 Kharkiv National University of Radio Electronics , Nauky av. 14 61166, Kharkiv , Ukraine

2025

000 0 0002

This paper introduces a novel, multidimensional cascade bagging system designed to work in online mode. The key feature is a cascade of simple, independently adjustable submetamodels, replacing a complex, monolithic metamodel. This approach provides computational efficiency, rapid processing, and online adaptability, allowing the system to respond to changing data characteristics without extensive retraining. Simulation results demonstrate forecasting accuracy gain and optimization potential.

eol>multidimensional ensemble cascade bagging online learning short-term electric load forecasting 1

1. Introduction

The landscape of information processing has been dramatically reshaped in recent years, largely due to the widespread adoption of artificial neural networks, particularly deep learning architectures. These networks have demonstrated remarkable capabilities across a vast spectrum of applications, encompassing areas such as audio and video processing, natural language understanding, and the analysis of complex biological data. The driving force behind this success lies in their inherent universal approximating and extrapolating properties.

However, the deployment of these powerful deep learning systems brings also significant challenges. They are quite slow to set up and require very large datasets for effective training. These training data requirements can be a major obstacle when addressing real-world problems where data acquisition is costly, time-consuming, or simply unavailable in sufficient quantities. Traditional, shallower neural networks offer a more rapid development cycle, particularly those employing bell-shaped kernel activation functions. However, they are often hampered by the “curse of dimensionality,” a phenomenon where performance degrades exponentially as the number of input features increases.

Recognizing these limitations, researchers have explored hybrid approaches that combine the strengths of different computational intelligence paradigms. Neuro-fuzzy systems, for instance, offer a promising avenue for tackling a wide class of Data Stream Mining tasks. These systems integrate the learning capabilities of neural networks with the interpretability and knowledge representation capabilities of fuzzy logic. Despite their potential, neuro-fuzzy systems also face their own set of shortcomings, which can restrict their applicability.

Often it is the case that the same problem can be addressed using several different computational intelligence systems. Selecting the most suitable system for a specific application can be a non-trivial endeavor, particularly when dealing with data streams – sequences of data arriving one observation at a time. These data streams are frequently non-stationary, meaning their underlying statistical properties change over time, further complicating the selection process.

Under these conditions, the ensemble approach [ 1–4 ] becomes a particularly attractive solution. This methodology involves tackling a single problem using multiple, diverse computational intelligence systems, and then combining their individual outputs to produce a final, refined result. Here the problem arises of how to combine the ensemble members outputs in order to obtain a final result that is optimal in some sense.

From a mathematical perspective, the bagging approach [ 5 ] is often considered the most effective technique for this combining process. In bagging, the output signals from all ensemble members are fed into the inputs of a so-called metamodel. This metamodel then processes these signals, generating a final result that represents a synthesis of the individual member contributions. A key advantage of this approach is the possibility of its implementation in online mode [ 6–9 ], allowing for real-time processing of sequential data with potentially non-stationary characteristics. However, when the output signals of the ensemble members are multidimensional sequences, this approach becomes complicated. In such cases, the signal presented to the metamodel can have a very high dimension, making it difficult to tune the metamodel and potentially slowing down its processing speed.

To mitigate this complexity, the cascade systems ideas [ 10–12 ] can be employed to simplify the implementation and functioning of the metamodel. Cascade systems break down the processing into a series of interconnected stages, where each stage operates on a smaller number of input signals. This modular approach reduces the overall computational burden and improves the efficiency of the system. Furthermore, the cascade approach has proven successful in the development of several hybrid computational intelligence systems, demonstrating its versatility and effectiveness.

Consequently, within the framework of this cascade bagging approach, the ensemble metamodel is structured as a set of submetamodels. Each submetamodel is fed with the output signal from the preceding cascade along with the output signal from one of the individual ensemble members. This cascading arrangement allows for a progressive refinement of the combined output, ultimately leading to a more accurate and robust final result.

2. Architecture of multidimensional cascade bagging system

Fig 1. shows the architecture of the proposed multidimensional ensemble bagging system consisting of p ensemble subsystems ES1 , ES2 , … , ESr , … , ES p connected in parallel and p - 1 bagging submetamodels SMM 2 , … , SMM r , … , SMM p. receive the same vector input signal x ( k ) = ( x1 ( k ) , … , xi ( k ) , … , xn ( k ))T ∈ Rn (here k = 1,2 , … is the current discrete time) and output a set of vector output signals ^y1 ( k ) , … , ^yr ( k ) , … , ^y p ( k ), submetamodels SMM 2 , … , SMM r , … , SMM p where ^yr ( k ) = ( ^yr 1 ( k ) , … , ^yrj ( k ) , … , ^yrm ( k ))T ∈ Rm. These signals are fed to a set of cascaded which form the bagging system outputs ^y*2 ( k ) , … , ^y*r ( k ) , … , ^y*p ( k ).

In the simplest case, if the output signal of the first ensemble member ES1 meets the a priori given accuracy target, submetamodels are not utilized and the ES1 output is used as the ensemble output ^y1* ( k ) = ^y1 ( k ). If the accuracy of ^y1 ( k ) is not satisfactory, then this signal together with the output of ES2 ^y2 ( k ) are fed into submetamodel SMM 2 which forms the signal ^y*2 ( k ) as ^y*2 ( k ) = c2 ^y2 ( k ) + (1 - c2) ^y1* ( k ) = c2 ^y2 ( k ) + (1 - c2) ^y1 ( k ) , where 0 ≤ c2 ≤ 1 is a tuned parameter.

The signal ^y*2 ( k ) should be more accurate than ^y1 ( k ). The same cascade principle applies to all subsequent submetamodels until the signal of the r-th submetamodel SMM r

^y*r ( k ) = cr ^yr ( k ) + (1 - cr ) ^y*r-1 ( k ) , 0 ≤ cr ≤ 1 meets the a priori given accuracy target or all ensemble members p are used.

The advantage of this approach is that each cascade has only one tuned parameter cr , r = 2,3 , … , p that greatly simplifies the whole metamodel tuning process.

3. Fast online cascade metamodel learning

The cascade bagging metamodel learning process involves continuous calculation of the parameters c2 , … , cr , … , c p to achieve and maintain the desired accuracy level for the task at hand Let’s introduce a vector learning error of the r-th submetamodel SMM r er ( k ) = y ( k ) - ^y*r ( k ) = y ( k ) - ^y*r-1 ( k ) - cr ( ^yr ( k ) - ^y*r-1 ( k )) = er -1 ( k ) - cr ( ^yr ( k ) - ^y*r-1 ( k )) , where y ( k ) is a m-dimentional reference signal, and its squared norm

2 2 ‖er ( k )‖ =‖er -1 ( k )‖2 - 2 cr eTr-1 ( k )( ^yr ( k ) - ^y*r-1 ( k )) + c2r‖^yr ( k ) - ^y*r-1 ( k )‖ .

Summing up squared errors over the whole dataset

2 2 ∑ ‖er ( k )‖ = ∑ ‖er -1 ( k )‖2 - 2 cr ∑ eTr-1 ( k )( ^yr ( k ) - ^y*r-1 ( k )) + c2r ∑ ‖^yr ( k ) - ^y*r-1 ( k )‖ , k k k k and solving for ∂ ∑ ‖er ( k )‖ k

2 ∂ cr 2 = - 2 ∑ eTr-1 ( k )( ^yr ( k ) - ^y*r-1 ( k )) + 2 cr ∑ ‖^yr ( k ) - ^y*r-1 ( k )‖ = 0 , k k (1) (2) (3) (4) (5) (6) ∑ eTr-1 ( k )( ^yr ( k ) - ^y*r-1 ( k )) cr = k 2 ∑‖^yr ( k ) - ^y*r-1 ( k )‖

For nonstationary situations, the submetamodels parameters calculations should be performed on a sliding window. Let s be a sliding window size, then (8) (9) (10) (11) When s = 1, we obtain a single-step learning rule in the following form

k ∑ cr ( k , s ) = τ =k - s+1

k ∑ τ =k - s+1 eTr-1 ( τ )( ^yr ( τ ) - ^y*r-1 ( τ )) ‖^yr ( τ ) - ^y*r-1 ( τ )‖ 2

. cr ( k ) = eTr-1 ( k )( ^yr ( k ) - ^y*r-1 ( k )) 2

‖^yr ( k ) - ^y*r-1 ( k )‖ It is easily seen that

2 2 ‖er ( k )‖ =‖er -1 ( k )‖ - 2 (eTr-1 ( k )( ^yr ( k ) - ^y*r-1 ( k ))) ‖^yr ( k ) - ^y*r-1 ( k )‖ 2 2 + + hence

2 2 (eTr-1 ( k )( ^yr ( k ) - ^y*r-1 ( k ))) ‖^yr ( k ) - ^y*r-1 ( k )‖ ‖^yr ( k ) - ^y*r-1 ( k )‖ 4

2 (eTr-1 ( k )( ^yr ( k ) - ^y*r-1 ( k ))) =‖er -1 ( k )‖ - 2 2 2 2 ‖er ( k )‖ ≤‖er -1 ( k )‖ .

This means that each submetamodel SMM r in the cascade system is not inferior in terms of accuracy than the preceding one SMM r -1.

The process of adding more cascades can continue until the desired accuracy is achieved or the maximum number of submetamodels p is reached. If the desired accuracy is achieved with a smaller number of submetamodels than currently employed r, the “excess” submetamodels can be removed to conserve computational resources. Both these processes of adding and removing submetamodels don’t require retraining the preceding submetamodels.

4. Simulation results

To evaluate the efficacy of the proposed cascade bagging system, we applied it to the challenging problem of short-term electric load forecasting (STLF). STLF is a critical task for regional power system operators, enabling efficient resource allocation and grid stability. To demonstrate the system’s capabilities, we utilized a real-world dataset comprising hourly electric load data collected over a one-year period at m = 4 geographically distinct nodes within a regional power system in Ukraine. This dataset consists of N = 8760 observations, each represented as a 4-element vector, representing four interdependent time series reflecting the electric load at each node.

The inherent complexity of the STLF problem stems from the stochastic nature of electricity demand. Inspection of the dataset (Fig. 2) revealed several key characteristics that pose significant forecasting challenges. The individual vector components – representing the load at each node – exhibit a combination of common trends, unique temporal patterns, and abrupt shifts in behavior. The data is also prone to outliers, representing unexpected surges or drops in demand, and is inherently subject to noise, arising from random influences. These complexities necessitate a robust forecasting methodology capable of capturing the nuanced relationships within the data. We believe an ensemble approach, particularly our cascade bagging system, offers a promising solution to mitigate these challenges.

For the 24-hour ahead forecasting task, we deployed p = 3 specialized Neuro-Fuzzy Network (NFN) models as ensemble subsystems. NFNs are a class of Hybrid Systems of Computational Intelligence, known for their ability to combine the strengths of neural networks (learning complex patterns) and fuzzy logic (handling uncertainty and expert knowledge) [ 13 ]. Each of these ensemble subsystems was carefully tailored for STLF problems, featuring distinct architectures and hyperparameter settings, we refer to them as ensemble subsystems ES1 - ES3 for clarity. This intentional variation in model structure and parameters was implemented to encourage diversity in the ensemble, enabling each subsystem to potentially capture different aspects of the dataset’s intricacies and ultimately contributing to a more comprehensive forecast. The specifics of the NFN design and parameter optimization for the STLF problem are beyond the scope of this paper, we refer readers to the corresponding sources [ 14, 15 ].

Fig. 3–5 present the 24-hour ahead forecasting results for a 168-hour period, generated by each of the corresponding ensemble subsystems. As anticipated, the individual subsystems exhibited distinct forecasting behaviors, reflecting their differing structures and parameter settings. A quantitative comparison of forecasting errors, detailed in Table 1, further confirms this diversity. To assess forecasting accuracy, we employed Mean Absolute Percentage Error (MAPE), a standard error measure widely used in STLF applications, calculated over the entire dataset. The diversity observed in the individual subsystem performance highlights the potential for the cascade bagging system to leverage these differences to generate a more accurate and robust forecast.

We applied the proposed cascade approach to combine the ensemble subsystems’ outputs. As we have 3 ensemble subsystems, 2 submetamodels are sufficient in the cascade structure according to the architecture shown in Fig. 1. The output signals are produced using relation (2). The data is treated in online mode, i.e. data vectors are processed sequentially, one-by-one and only once, as a data stream. This eliminates the need to divide the dataset into training, validation, and test sets.

In order to be able to adapt the cascade combining process to changing properties of the signals over time (trend shifts, unexpected fluctuations, etc.), we used the sliding window version of the proposed learning algorithm (8). Choosing a proper value for the sliding window size s, provides a reasonable tradeoff between the smoothing and following properties of the learning process. It should be noted that a suitable meta-algorithm could be implemented to dynamically adjust this parameter during the learning process, based on real-time monitoring of learning errors, further optimizing the system’s responsiveness and accuracy.

The outputs of submetamodels SMM 2 and SMM 3 are displayed in Fig. 6 and Fig. 7 respectively. The corresponding errors are presented in Table 1. As we are dealing with vector signals, the first 4 rows of Table 1 represent errors for each of the 4 components, and the last row contains the errors averaged over the whole vector as a cumulative metric. And finally, for the reference purpose, the last column in Table 1 lists errors for a simple averaging ensemble approach widely used in bagging procedures.

Let’s now examine the data and make some observations.

While ES3 exhibits a lower average error compared to ES2, which in turn is better than ES1, this ranking does not universally hold for all individual vector components. Notably, ES1 demonstrates the highest accuracy for the first component; ES3 outperforms the others for the second and third components; and ES2 is the most accurate for the fourth component. This confirms the diversity and complementary strengths of the individual ensemble subsystems.

Submetamodel SMM 2 (which combines the outputs of ES1 and ES2) consistently outperforms all ensemble subsystems both in terms of the average error and for each component individually. In case this level of accuracy would be sufficient for the task at hand, the ES3 and SMM 3 could be removed from the system to reduce computational load.

Submetamodel SMM 3 (which combines the outputs of SMM 2 and ES3) further reduces all error metrics in comparison to SMM 2, validating the theoretical prediction outlined in relation (11). This incremental improvement underscores the effectiveness of the cascading structure.

The simple averaging approach (used here only for reference) provides more accurate forecasts than any of the ensemble subsystems alone. However, the proposed adaptive cascade procedure consistently outperforms the simple averaging starting already from the first submetamodel. This demonstrates the effectiveness of the adaptive learning algorithm.

Therefore, the simulation results conclusively confirm the effectiveness of the proposed adaptive cascade ensemble approach, showcasing its ability to generate superior forecasts compared to individual models and a basic averaging ensemble. The ability to adapt to changing signal characteristics and the potential for computational optimization further increase the practical utility of this approach.

5. Conclusions

Traditional bagging systems, particularly those dealing with multidimensional outputs, often face challenges in scalability and adaptability. The complexity of a single metamodel tasked with integrating the outputs of numerous ensemble members can become computationally expensive, especially when dealing with high-dimensional, non-stationary data streams. To address these limitations, we introduced a novel multidimensional bagging system that leverages a cascade of simple, independently configurable submetamodels. This approach prioritizes computational simplicity, high processing speed, and online adaptability.

The core innovation of our system lies in replacing the monolithic metamodel with a sequence of interconnected, computationally lightweight submetamodels arranged in a cascade structure. Each submetamodel is configured online, i.e. its single parameter can be adjusted dynamically in response to changes in the data stream. The independent learning of each submetamodel allows them to adapt to changing data characteristics without requiring global retraining of the entire system. The computational simplicity and high speed make the proposed cascade system particularly well-suited for real-time applications where rapid adaptation to changing conditions is paramount. Furthermore, the modularity of the system facilitates easy expansion and modification, allowing for seamless integration of new ensemble members or the incorporation of more sophisticated submetamodels as computational resources become available. This adaptability allows the system to remain effective even as data characteristics and application requirements evolve.

Simulation results conclusively proved the proposed method’s ability to generate more accurate forecasts and offer potential for computational optimization, enhancing its practical utility. Namely, SMM 2 consistently outperformed the individual subsystems, and SMM 3 further reduced forecasting errors, validating the cascade structure’s effectiveness. The adaptive learning algorithm, incorporating a sliding window, allows the system to respond to changing signal characteristics, while keeping the balance between tracking and smoothing behavior.

Future work will focus on exploring various online learning algorithms for submetamodels tuning and investigating the optimal cascade depth for different data characteristics.

Declaration on Generative AI

The authors have not employed any Generative AI tools. [15] Ye. Bodyanskiy, S. Popov, M. Titov, Robust Learning Algorithm for Networks of Neuro-Fuzzy Units, in: T. Sobh (Eds) Innovations and Advances in Computer Sciences and Engineering. Springer, Dordrecht, 2010. https://doi.org/10.1007/978-90-481-3658-2_59

[1]

Zhang , Y. Ma, Ensemble machine learning: Methods and applications , Springer, 2012 .

[2]

Sarkar ,

Natarajan , Ensemble Machine Learning Cookbook, Packt Publishing Limited , 2019 .

[3]

Kunapuli , Ensemble methods for machine learning , Simon and Schuster , 2023 .

[4]

Dong ,

Yu ,

Cao , et al, A survey on ensemble learning . Front. Comput. Sci . 14 ( 2020 ) 241 - 58 . https://doi.org/10.1007/s11704-019-8208-z

[5]

Breiman , Bagging predictors, Machine Learning 24 ( 1996 ) 126 - 140 . https://doi.org/10.1007/BF00058655

[6]

Bifet ,

Holmes ,

Pfahringer ,

Gavaldà , Improving Adaptive Bagging Methods for Evolving Data Streams , in: Z. H. Zhou , T. Washio (Eds) Advances in Machine Learning . ACML 2009. Lecture Notes in Computer Science , vol 5828 . Springer, Berlin, Heidelberg, 2009 .. https://doi.org/10.1007/978-3- 642 -05224- 8 _ 4

[7]

Lughofer ,

Pratama , I. Skrjanc , Online bagging of evolving fuzzy systems , Information Sciences 570 ( 2021 ) 16 - 33 . https://doi.org/10.1016/j.ins. 2021 . 04 .041

[8] Ye . Bodyanskiy, P.

Otto , I. Pliss, S.

Popov , An Optimal Algorithm for Combining Multivariate Forecasts in Hybrid Systems , in: V. Palade , R.J. Howlett , L. Jain (Eds) Knowledge-Based Intelligent Information and Engineering Systems . KES 2003. Lecture Notes in Computer Science , vol 2774 , Springer, Berlin, Heidelberg, 2003 . https://doi.org/10.1007/978-3- 540 -45226- 3_ 132

[9] Ye. Bodyanskiy , S. Popov , Fuzzy Selection Mechanism for Multimodel Prediction , in: M.G. Negoita,

R.J.

Howlett , L.C. Jain (Eds) Knowledge-Based Intelligent Information and Engineering Systems . KES 2004. Lecture Notes in Computer Science , vol 3214 . Springer, Berlin, Heidelberg, 2004 . https://doi.org/10.1007/978-3- 540 -30133-2_ 101

[10]

F.A.

Shah ,

M.A.

Khan , M. Sharif et . al., A cascaded design of best features selection for fruit diseases recognition , Comput. Mater. Contin 70 ( 1 ) ( 2022 ) 1491 - 1507 . https://doi.org/10.32604/cmc. 2022 .019490

[11] Ye. Bodyanskiy , O. Tyshchenko , A Hybrid Cascade Neuro-Fuzzy Network with Pools of Extended Neo-Fuzzy Neurons and its Deep Learning , International Journal of Applied Mathematics and Computer Science 29 ( 2 ) ( 2019 ) 477 - 488 . https://doi.org/10.2478/amcs-2019 - 0035

[12]

Rao ,

Li ,

Zhang ,

Chen , W. Giernacki, Position Control of Quadrotor UAV Based on Cascade Fuzzy Neural Network, Energies 15 ( 5 ) ( 2022 ) 1763 . https://doi.org/10.3390/en15051763

[13]

Talpur ,

S.J.

Abdulkadir ,

Alhussian et al. A comprehensive review of deep neuro-fuzzy system architectures and their optimization methods , Neural Comput & Applic 34 ( 2022 ) 1837 - 1875 https://doi.org/10.1007/s00521-021-06807-9

[14]

Chernenko ,

Martyniuk ,

Popov , Ye. Bodyanskiy, Comparative analysis of two approaches to solving the problem of short-term forecasting of the total electrical load of a power system , Technical Electrodynamics 3 , ( 2013 ) 61 - 72 .