1. Introduction

Adaptive Online Bagging using the Cascade Approach⋆

Sergiy Popov

Iryna Pliss

Oleh Zolotukhin

Maryna Kudryavtseva

0 0 Kharkiv National University of Radio Electronics , 14 Nauky av., Kharkiv, 61166 , Ukraine

2025

2774 15 17

This paper introduces a novel adaptive cascade bagging system designed for real-time processing of complex, dynamic signals. Leveraging ensemble learning and a cascade architecture, the system dynamically adjusts model weighting and member count to optimize performance in non-stationary environments. Simulation results demonstrate a progressive reduction in forecasting errors on each cascade, with the 4th submetamodel surpassing the best individual ensemble member and the 6th achieving a 1.23-fold error reduction. This proves the proposed approach effectiveness and computational efficiency.

eol>cascade architecture adaptive bagging online learning time series forecasting1

1. Introduction

The field of Data Mining has witnessed a dramatic shift in recent years, with a growing reliance on sophisticated computational intelligence techniques to tackle complex problems. Traditional statistical methods, while still valuable, often struggle to effectively handle the sheer volume and complexity of modern datasets. Consequently, artificial neural networks (ANNs), in both their deep and more traditional shallow forms, alongside neuro-fuzzy systems, neo-fuzzy systems, waveletneuro-fuzzy networks, and other hybrid computational intelligence systems, have become increasingly prevalent tools for a wide range of Data Mining tasks. These systems are particularly effective in classification problems, pattern recognition, extrapolation, regression analysis, diagnostics, modeling, and more.

The core appeal of these computational intelligence approaches lies in their universal approximation properties and the ability to adjust their internal parameters, and in some cases even their architecture, through a process of learning from training data. This adaptability allows them to tailor themselves to the specific characteristics of the problem at hand, leading to potentially superior performance. The training process involves feeding the system labeled data examples, allowing it to refine its internal workings to map inputs to desired outputs.

However, the selection of the right computational intelligence system for a particular Data Mining task is far from straightforward. While multiple systems may be capable of solving the same problem, determining which one will deliver the best results is often impossible a priori. Each system possesses its own strengths and weaknesses, making the choice a complex trade-off. For instance, deep neural networks (DNNs), the current favorites of many AI applications, are known for their potential to achieve extremely high accuracy. However, this performance comes at a significant cost. DNNs typically require massive amounts of training data – often tens of thousands or even millions of labeled examples – and can demand substantial computational resources and time for training. The training process can be iterative, requiring multiple passes through the data and careful tuning of hyperparameters. In contrast, Radial Basis Function Neural Networks (RBFNs) offer a considerably faster learning process. This makes them attractive for applications where rapid deployment or realtime performance is crucial. However, RBFNs are susceptible to the “curse of dimensionality”, which arises when dealing with high-dimensional datasets. As the dimensionality increases, the performance of RBFNs degrades significantly, requiring exponentially more neurons to maintain accuracy.

Neo-fuzzy systems, another option, are known for their high learning speed and ability to incorporate human expertise through fuzzy logic principles. However, they don’t always guarantee the necessary approximation properties to accurately model complex relationships within the data. They might be fast to train, but the resulting model might not capture the underlying patterns effectively.

The challenge, therefore, isn't simply about applying these powerful tools; it's about understanding their nuances and selecting the most appropriate system – or even a combination of systems – for the specific problem, dataset, and desired performance characteristics. This often involves experimentation, careful evaluation of results, and a deep understanding of the strengths and limitations of each approach. The optimal solution frequently emerges through iterative refinement and a willingness to explore different architectural choices and training methodologies.

When faced with complex challenges where individual systems exhibit varying strengths and weaknesses, the use of ensemble approaches can prove remarkably effective [ 1-8 ]. The core principle behind ensemble methods is to harness the collective intelligence of multiple models to achieve a superior outcome compared to any single model acting alone.

Historically, the vast majority of ensemble methods have relied on a batch (offline) approach. This involves providing the entire training dataset upfront and repeatedly analyzing it to train and refine the individual models and the ensemble. While effective, this approach can be computationally expensive and less adaptable to dynamic data environments. However, there are several online ensemble methods [9-11] specifically designed to address Data Stream Mining problems, where data arrives sequentially and potentially at a very high rate.

Within the broader landscape of ensemble approaches, bagging procedures [12-14] have emerged as particularly powerful techniques. Bagging, short for “bootstrap aggregation,” involves training multiple individual models on different subsets of the training data. The results generated by each of these models are then fed into a metamodel, also sometimes referred to as a combiner or aggregator. This metamodel acts as a sophisticated decision-making engine, intelligently combining the output signals from all ensemble members to synthesize the final, optimal solution.

A common challenge in implementing bagging is determining the optimal number of ensemble members. A small number of members might not provide sufficient diversity to capture the full complexity of the problem, leading to limited accuracy gains. Conversely, an excessively large number can significantly complicate the training process of the metamodel, increasing computational cost and potentially leading to overfitting. A promising avenue of research addresses this challenge through evolutionary approaches [13-15]. These methods dynamically adjust the number of ensemble members during the metamodel learning process. This means the number of inputs to the metamodel is constantly changing, introducing a layer of complexity to both the metamodel itself and its learning process.

To simplify and accelerate the bagging process, we apply a cascade approach. Instead of relying on a single, complex metamodel, a cascade approach employs a series of relatively simple metamodels arranged in sequence. This significantly simplifies the synthesis of the metamodel and the subsequent tuning process.

2. Cascade bagging system architecture

subsystems receive the same vector signal x ( k ) = ( x1 ( k ) , … , xi ( k ) , … , xn ( k ))T (here k is the current discrete time). Scalar output signals ^y1 ( k ) , … , ^yr ( k ) , … , ^y p ( k ) are calculated at their outputs. If the output signal ^y1 ( k ) satisfies the a priori specified accuracy requirements, the bagging procedure is not required and the output of the system as a whole is the output signal ^y1* ( k ) = ^y1 ( k ). Otherwise, the signal ^y1 ( k ) is fed to the first input of the bagging submetamodel SMM 2, the second input of which is fed with the output signal of the second subsystems S2 – ^y2 ( k ). The SMM 2 output signal is formed as follows

^y*2 ( k ) = c2 ^y2 ( k ) + (1 - c2) ^y1* ( k ) , where c2 is the single tuned parameter of the submetamodel SMM 2. Signal ^y*2 ( k ) should be better in terms of accuracy than ^y1* ( k ) and ^y2 ( k ).

Then, signal ^y*2 ( k ) is fed to the submetamodel SMM 3, whose other input receives ^y3 ( k ). SMM 3 produces the following result here ^y*3 ( k ) should be better in terms of accuracy than ^y*2 ( k ) and ^y3 ( k ).

And finally, the last submetamodel SMM p produces the following output signal ^y*3 ( k ) = c3 ^y3 ( k ) + (1 - c3) ^y*2 ( k ) , ^y*p ( k ) = c p ^y p ( k ) + (1 - c p) ^y p -1 ( k ) , * which should be better in terms of accuracy than output signals of all previous submetamodels.

In this setup, if the signal of any previous submetamodel SMM r satisfies all the accuracy requirements, then the process of building up submetamodels can be stopped and only r members of the ensemble will be activated in the system.

The advantage of this approach is the simplicity of its implementation, since in each submetamodel only one parameter cr is being tuned, which can be calculated in online real-time mode, while the ensemble itself contains only the required number of members – ensemble subsystems. Also note that in non-stationary situations the number of activated submetamodels can both decrease and increase during the operation depending on the required solution accuracy.

3. Submetamodels online learning

The process of training submetamodels consists in adjusting the parameters c2 , c3 , … , c p in each of the system cascades.

Let us write the output signal of the p-th cascade in the form ^y*p ( k ) = c p ^y p ( k ) + (1 - c p) ^y p -1 ( k ) = c p ( ^y p ( k ) - ^y*p -1 ( k )) + ^y*p -1 ( k ) ,

* and introduce the error signal where y ( k ) is a reference signal.

The squared error has the form e p ( k ) = y ( k ) - ^y*p ( k ) = y ( k ) - c p ( ^y p ( k ) - ^y*p -1 ( k )) - ^y*p -1 ( k ) = e p -1 ( k ) - c p ( ^y p ( k ) - ^y*p -1 ( k )) , 2 e2p ( k ) = e2p -1 ( k ) - 2 e p -1 ( k ) c p ( ^y p ( k ) - ^y*p -1 ( k )) + c2p ( ^y p ( k ) - ^y*p -1 ( k )) . 2 ∑ e2p ( k ) = ∑ e2p -1 ( k ) - 2 c p ∑ e p -1 ( k )( ^y p ( k ) - ^y*p -1 ( k )) + c2p ∑ ( ^y p ( k ) - ^y*p -1 ( k )) .

k k k k And after summing up over the training set

Then we solve a differential equation ∂ ∑ e2p ( k ) k ∂ c p and get a rather simple relation which in a single-step form can be written as

∑ e p -1 ( k )( ^y p ( k ) - ^y*p -1 ( k )) c p ( k ) = k 2

2 = - 2 ∑ e p -1 ( k )( ^y p ( k ) - ^y*p -1 ( k )) + 2 c p ∑ ( ^y p ( k ) - ^y*p -1 ( k )) = 0 ,

k k

Also note that when processing nonstationary signals disturbed by noise, it is appropriate to organize the process of the parameters c2 , c3 , … , c p tuning over a sliding window. This would provide a trade-off between the following and filtering properties of the bagging procedure.

4. Simulation results

To validate the effectiveness and practicality of the proposed adaptive cascade bagging system, we applied it to a challenging real-world problem: short-term electric load forecasting (STLF). Specifically, our test case focuses on 1-step ahead forecasting of the daily electric load for a regional power system in Ukraine. This application presents a particularly demanding scenario due to the inherent complexities and non-stationarities commonly found in electric load data.

The dataset utilized for this simulation comprises an original time series containing N = 337 samples of daily electric load data used as a reference signal represented in Figure 2. It exhibits a complex pattern characterized by several discernible trends corresponding to different seasons. The data also reveals periodic components, predominantly weekly fluctuations reflecting variations in energy consumption across the week. Furthermore, the series is punctuated by sudden changes and outliers, indicative of unexpected events impacting energy demand. A strong random component is also evident, which is a common characteristic of electric load data in large systems. This randomness arises from the multitude of external factors influencing energy consumption, many of which possess inherently random or chaotic behavior. Weather conditions, for instance, are a prime example of a factor significantly impacting energy demand and exhibiting complex fluctuations [16].

These trends, periodicities, sudden changes, outliers, and the significant random component bring non-stationarity and noisiness to the time series. Consequently, its forecasting presents a considerable challenge. In such scenarios, it is frequently observed that different forecasting models or methods demonstrate superior performance on particular segments of the series, while exhibiting inferior performance on others. It’s rare for a single model or method to consistently outperform all others across the entire series. This is precisely the situation where bagging methods, and particularly our adaptive cascade bagging system, can prove useful. By combining the forecasts from multiple models, the bagging approach aims to extract the best predictions from each individual model, ultimately improving the overall forecasting accuracy and robustness.

We employed q = 6 distinct and independent ensemble subsystems – computational intelligence models of various structures and complexity [17] – to produce six corresponding forecasts ^y1 ( k ) , … , ^y6 ( k ) to be further fed into the corresponding submetamodels. For the purpose of this study, we sorted the ensemble subsystems in terms of increasing complexity. Such a diversity is aimed at capturing different properties of different parts of the series under consideration. Figure 3 shows the last 30 days of the original time series and the ensemble subsystems’ forecasts ^y1 ( k ) , … , ^y6 ( k ). We can see that long-term trends are more or less well captured by all subsystems, but short-term changes pose a problem to all of them so that no single subsystem is significantly better than the others for all data points.

The corresponding forecasts demonstrate a decreasing trend of the forecasting errors presented in Table 1. We used a set of error measures widely adopted in time series forecasting: 1. Mean Absolute Error (MAE); 2. Mean Absolute Scaled Error, scaled by a 1-step ahead naive forecast (MASE1);

During the simulation, we applied the proposed adaptive cascade bagging system to generate six forecasts ^y1* ( k ) , … , ^y*6 ( k ), where ^y1* ( k ) actually duplicates ^y1 ( k ) and ^y*2 ( k ) , … , ^y*6 ( k ) are produced by the corresponding submetamodels. We treated the forecasting process as an online operation, mirroring the real-time nature of electric load management. This means the entire dataset was processed sequentially, sample by sample, without the traditional division into training, validation, and test sets. This online processing approach reflects a core design principle of the adaptive cascade bagging system – its ability to learn and adapt in real-time as new data arrives. ^y1* ( k ) (blue line), ^y*2 ( k ) (brown line), ^y*3 ( k ) (cyan line), ^y*4 ( k ) (red line), ^y5* ( k ) (green line), ^y*6 ( k ) (dotted black line).

These observations demonstrate the effectiveness of the cascading approach and highlight the significant efficiency gains it provides.

Perhaps most importantly, these properties have direct implications for the system’s efficiency. Consider a scenario where MAPE level of 4.5% is deemed acceptable for the task at hand. Our analysis reveals that none of the six individual ensemble subsystems alone can achieve this level of accuracy. However employing only the first four (simplest) ensemble subsystems within the proposed adaptive cascade bagging system is sufficient to consistently achieve the desired accuracy of 4.5% or better. This demonstrates a significant reduction in computational resources and complexity, as only a fraction of the overall system (4 simplest out of 6 total ensemble subsystems) is required to meet the performance target.

This means that on each forecasting step we can use only a minimally sufficient number of the simplest ensemble subsystems in the cascade system to achieve the desired result and skip calculations of more complex ensemble subsystems, hence conserving the computational resources which can be beneficial e.g. in embedded systems running on battery power. If the accuracy drops below the desired level, additional ensemble subsystems and the corresponding submetamodels can be switched on without retraining the preceding part of the adaptive cascade bagging system.

5. Conclusions

To address the challenges of processing complex and dynamic signals, we introduced a novel adaptive cascade bagging system. This system leverages the power of ensemble learning to achieve optimized results while maintaining the flexibility of online tuning. The system’s design is rooted in the cascade approach, where the outputs of multiple computational intelligence systems (e.g., neural networks, support vector machines, fuzzy logic systems) are processed sequentially through a series of simple submetamodels. The core advantage lies in its ability to dynamically adjust both the weighting of individual ensemble subsystems and the number of ensemble subsystems themselves, all while processing data in real-time. This is particularly crucial when dealing with disturbed nonstationary signals – signals that are both noisy and whose characteristics change over time, making traditional, offline approaches less effective. This online tuning capability is essential for handling non-stationary signals where the optimal combination of ensemble subsystems may change as the signal characteristics evolve. It ensures that the system adapts to changing signal characteristics and maintains optimal performance over time, without requiring manual intervention or retraining.

From a computational standpoint, the proposed system is remarkably simple. It is specifically designed for online processing scenarios where data arrives at a sufficiently high rate. The cascade structure, combined with efficient optimization algorithm, minimizes the computational overhead required for processing each data point. This allows the system to operate in real-time, making it suitable for applications such as anomaly detection in network traffic, predictive maintenance of industrial equipment, or real-time financial trading.

A detailed analysis of the simulation results demonstrates the effectiveness of the proposed adaptive cascade bagging system, revealing a progressive reduction in output errors with each subsequent submetamodel, consistently outperforming the ensemble subsystems. Notably, the 4th submetamodel’s accuracy already surpasses that of the best individual ensemble subsystem, and the final (6th) submetamodel achieves a 1.23-fold reduction in MAPE compared to the best individual ensemble subsystem. This cascade architecture allows for significant efficiency gains; specifically, achieving an acceptable MAPE level of 4.5% requires only the first four simplest ensemble subsystems, a substantial reduction in computational resources and complexity compared to utilizing the entire six-subsystems ensemble.

Our further research will focus on generalizing the proposed architecture and the learning algorithm to a multivariate case.

Declaration on Generative AI

The author(s) have not employed any Generative AI tools.

[1]

Wu ,

Levinson , The ensemble approach to forecasting: A review and synthesis . Transportation Research Part C: Emerging Technologies, 132 ( 2021 ) 103357 . https://doi.org/10.1016/j.trc. 2021 .103357

[2]

Rane ,

S.P.

Choudhary ,

Rane , Ensemble deep learning and machine learning: applications, opportunities, challenges, and future directions . Studies in Medical and Health Sciences , 1 ( 2 ) ( 2024 ) 18 - 41 . http://dx.doi.org/10.2139/ssrn.4849885

[3]

T. N.

Rincy ,

Gupta , Ensemble learning techniques and its efficiency in machine learning: A survey, in: Proc. 2nd international conference on data, engineering and applications (IDEA) , IEEE, 2020 , pp. 1 - 6 . https://doi.org/10.1109/IDEA49133. 2020 .9170675

[4]

Belayneh ,

Adamowski ,

Khalil , J. Quilty, Coupling machine learning methods with wavelet transforms and the bootstrap and boosting ensemble approaches for drought prediction . Atmospheric research , 172 ( 2016 ) 37 - 47 . https://doi.org/10.1016/j.atmosres. 2015 . 12 .017

[5]

Kunapuli , Ensemble methods for machine learning , Simon and Schuster , 2023 .