<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Adaptive Online Bagging using the Cascade Approach⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sergiy Popov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Iryna Pliss</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Oleh Zolotukhin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maryna Kudryavtseva</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Kharkiv National University of Radio Electronics</institution>
          ,
          <addr-line>14 Nauky av., Kharkiv, 61166</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>2774</volume>
      <fpage>15</fpage>
      <lpage>17</lpage>
      <abstract>
        <p>This paper introduces a novel adaptive cascade bagging system designed for real-time processing of complex, dynamic signals. Leveraging ensemble learning and a cascade architecture, the system dynamically adjusts model weighting and member count to optimize performance in non-stationary environments. Simulation results demonstrate a progressive reduction in forecasting errors on each cascade, with the 4th submetamodel surpassing the best individual ensemble member and the 6th achieving a 1.23-fold error reduction. This proves the proposed approach effectiveness and computational efficiency.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;cascade architecture</kwd>
        <kwd>adaptive bagging</kwd>
        <kwd>online learning</kwd>
        <kwd>time series forecasting1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>The field of Data Mining has witnessed a dramatic shift in recent years, with a growing reliance on
sophisticated computational intelligence techniques to tackle complex problems. Traditional
statistical methods, while still valuable, often struggle to effectively handle the sheer volume and
complexity of modern datasets. Consequently, artificial neural networks (ANNs), in both their deep
and more traditional shallow forms, alongside neuro-fuzzy systems, neo-fuzzy systems,
waveletneuro-fuzzy networks, and other hybrid computational intelligence systems, have become
increasingly prevalent tools for a wide range of Data Mining tasks. These systems are particularly
effective in classification problems, pattern recognition, extrapolation, regression analysis,
diagnostics, modeling, and more.</p>
      <p>The core appeal of these computational intelligence approaches lies in their universal
approximation properties and the ability to adjust their internal parameters, and in some cases even
their architecture, through a process of learning from training data. This adaptability allows them to
tailor themselves to the specific characteristics of the problem at hand, leading to potentially superior
performance. The training process involves feeding the system labeled data examples, allowing it to
refine its internal workings to map inputs to desired outputs.</p>
      <p>However, the selection of the right computational intelligence system for a particular Data
Mining task is far from straightforward. While multiple systems may be capable of solving the same
problem, determining which one will deliver the best results is often impossible a priori. Each system
possesses its own strengths and weaknesses, making the choice a complex trade-off. For instance,
deep neural networks (DNNs), the current favorites of many AI applications, are known for their
potential to achieve extremely high accuracy. However, this performance comes at a significant cost.
DNNs typically require massive amounts of training data – often tens of thousands or even millions
of labeled examples – and can demand substantial computational resources and time for training. The
training process can be iterative, requiring multiple passes through the data and careful tuning of
hyperparameters. In contrast, Radial Basis Function Neural Networks (RBFNs) offer a considerably
faster learning process. This makes them attractive for applications where rapid deployment or
realtime performance is crucial. However, RBFNs are susceptible to the “curse of dimensionality”, which
arises when dealing with high-dimensional datasets. As the dimensionality increases, the
performance of RBFNs degrades significantly, requiring exponentially more neurons to maintain
accuracy.</p>
      <p>Neo-fuzzy systems, another option, are known for their high learning speed and ability to
incorporate human expertise through fuzzy logic principles. However, they don’t always guarantee
the necessary approximation properties to accurately model complex relationships within the data.
They might be fast to train, but the resulting model might not capture the underlying patterns
effectively.</p>
      <p>The challenge, therefore, isn't simply about applying these powerful tools; it's about
understanding their nuances and selecting the most appropriate system – or even a combination of
systems – for the specific problem, dataset, and desired performance characteristics. This often
involves experimentation, careful evaluation of results, and a deep understanding of the strengths
and limitations of each approach. The optimal solution frequently emerges through iterative
refinement and a willingness to explore different architectural choices and training methodologies.</p>
      <p>
        When faced with complex challenges where individual systems exhibit varying strengths and
weaknesses, the use of ensemble approaches can prove remarkably effective [
        <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4 ref5">1-8</xref>
        ]. The core principle
behind ensemble methods is to harness the collective intelligence of multiple models to achieve a
superior outcome compared to any single model acting alone.
      </p>
      <p>Historically, the vast majority of ensemble methods have relied on a batch (offline) approach. This
involves providing the entire training dataset upfront and repeatedly analyzing it to train and refine
the individual models and the ensemble. While effective, this approach can be computationally
expensive and less adaptable to dynamic data environments. However, there are several online
ensemble methods [9-11] specifically designed to address Data Stream Mining problems, where data
arrives sequentially and potentially at a very high rate.</p>
      <p>Within the broader landscape of ensemble approaches, bagging procedures [12-14] have emerged
as particularly powerful techniques. Bagging, short for “bootstrap aggregation,” involves training
multiple individual models on different subsets of the training data. The results generated by each of
these models are then fed into a metamodel, also sometimes referred to as a combiner or aggregator.
This metamodel acts as a sophisticated decision-making engine, intelligently combining the output
signals from all ensemble members to synthesize the final, optimal solution.</p>
      <p>A common challenge in implementing bagging is determining the optimal number of ensemble
members. A small number of members might not provide sufficient diversity to capture the full
complexity of the problem, leading to limited accuracy gains. Conversely, an excessively large
number can significantly complicate the training process of the metamodel, increasing
computational cost and potentially leading to overfitting. A promising avenue of research addresses
this challenge through evolutionary approaches [13-15]. These methods dynamically adjust the
number of ensemble members during the metamodel learning process. This means the number of
inputs to the metamodel is constantly changing, introducing a layer of complexity to both the
metamodel itself and its learning process.</p>
      <p>To simplify and accelerate the bagging process, we apply a cascade approach. Instead of relying
on a single, complex metamodel, a cascade approach employs a series of relatively simple
metamodels arranged in sequence. This significantly simplifies the synthesis of the metamodel and
the subsequent tuning process.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Cascade bagging system architecture</title>
      <p>subsystems receive the same vector signal x ( k ) = ( x1 ( k ) , … , xi ( k ) , … , xn ( k ))T (here k is the current
discrete time). Scalar output signals ^y1 ( k ) , … , ^yr ( k ) , … , ^y p ( k ) are calculated at their outputs. If the
output signal ^y1 ( k ) satisfies the a priori specified accuracy requirements, the bagging procedure is
not required and the output of the system as a whole is the output signal ^y1* ( k ) = ^y1 ( k ). Otherwise,
the signal ^y1 ( k ) is fed to the first input of the bagging submetamodel SMM 2, the second input of
which is fed with the output signal of the second subsystems S2 – ^y2 ( k ). The SMM 2 output signal is
formed as follows</p>
      <p>^y*2 ( k ) = c2 ^y2 ( k ) + (1 - c2) ^y1* ( k ) ,
where c2 is the single tuned parameter of the submetamodel SMM 2. Signal ^y*2 ( k ) should be better in
terms of accuracy than ^y1* ( k ) and ^y2 ( k ).</p>
      <p>Then, signal ^y*2 ( k ) is fed to the submetamodel SMM 3, whose other input receives ^y3 ( k ).
SMM 3 produces the following result
here ^y*3 ( k ) should be better in terms of accuracy than ^y*2 ( k ) and ^y3 ( k ).</p>
      <p>And finally, the last submetamodel SMM p produces the following output signal
^y*3 ( k ) = c3 ^y3 ( k ) + (1 - c3) ^y*2 ( k ) ,
^y*p ( k ) = c p ^y p ( k ) + (1 - c p) ^y p -1 ( k ) ,
*
which should be better in terms of accuracy than output signals of all previous submetamodels.</p>
      <p>In this setup, if the signal of any previous submetamodel SMM r satisfies all the accuracy
requirements, then the process of building up submetamodels can be stopped and only r members of
the ensemble will be activated in the system.</p>
      <p>The advantage of this approach is the simplicity of its implementation, since in each
submetamodel only one parameter cr is being tuned, which can be calculated in online real-time
mode, while the ensemble itself contains only the required number of members – ensemble
subsystems. Also note that in non-stationary situations the number of activated submetamodels can
both decrease and increase during the operation depending on the required solution accuracy.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Submetamodels online learning</title>
      <p>The process of training submetamodels consists in adjusting the parameters c2 , c3 , … , c p in each of
the system cascades.</p>
      <p>Let us write the output signal of the p-th cascade in the form
^y*p ( k ) = c p ^y p ( k ) + (1 - c p) ^y p -1 ( k ) = c p ( ^y p ( k ) - ^y*p -1 ( k )) + ^y*p -1 ( k ) ,</p>
      <p>*
and introduce the error signal
where y ( k ) is a reference signal.</p>
      <p>The squared error has the form
e p ( k ) = y ( k ) - ^y*p ( k ) = y ( k ) - c p ( ^y p ( k ) - ^y*p -1 ( k )) - ^y*p -1 ( k ) = e p -1 ( k ) - c p ( ^y p ( k ) - ^y*p -1 ( k )) ,
2
e2p ( k ) = e2p -1 ( k ) - 2 e p -1 ( k ) c p ( ^y p ( k ) - ^y*p -1 ( k )) + c2p ( ^y p ( k ) - ^y*p -1 ( k )) .
2
∑ e2p ( k ) = ∑ e2p -1 ( k ) - 2 c p ∑ e p -1 ( k )( ^y p ( k ) - ^y*p -1 ( k )) + c2p ∑ ( ^y p ( k ) - ^y*p -1 ( k )) .</p>
      <p>k k k k
And after summing up over the training set</p>
      <p>Then we solve a differential equation
∂ ∑ e2p ( k )
k
∂ c p
and get a rather simple relation
which in a single-step form can be written as</p>
      <p>∑ e p -1 ( k )( ^y p ( k ) - ^y*p -1 ( k ))
c p ( k ) = k 2</p>
      <p>2
= - 2 ∑ e p -1 ( k )( ^y p ( k ) - ^y*p -1 ( k )) + 2 c p ∑ ( ^y p ( k ) - ^y*p -1 ( k )) = 0 ,</p>
      <p>k k</p>
      <p>Also note that when processing nonstationary signals disturbed by noise, it is appropriate to
organize the process of the parameters c2 , c3 , … , c p tuning over a sliding window. This would
provide a trade-off between the following and filtering properties of the bagging procedure.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Simulation results</title>
      <p>To validate the effectiveness and practicality of the proposed adaptive cascade bagging system, we
applied it to a challenging real-world problem: short-term electric load forecasting (STLF).
Specifically, our test case focuses on 1-step ahead forecasting of the daily electric load for a regional
power system in Ukraine. This application presents a particularly demanding scenario due to the
inherent complexities and non-stationarities commonly found in electric load data.</p>
      <p>The dataset utilized for this simulation comprises an original time series containing N = 337
samples of daily electric load data used as a reference signal represented in Figure 2. It exhibits a
complex pattern characterized by several discernible trends corresponding to different seasons. The
data also reveals periodic components, predominantly weekly fluctuations reflecting variations in
energy consumption across the week. Furthermore, the series is punctuated by sudden changes and
outliers, indicative of unexpected events impacting energy demand. A strong random component is
also evident, which is a common characteristic of electric load data in large systems. This randomness
arises from the multitude of external factors influencing energy consumption, many of which possess
inherently random or chaotic behavior. Weather conditions, for instance, are a prime example of a
factor significantly impacting energy demand and exhibiting complex fluctuations [16].</p>
      <p>These trends, periodicities, sudden changes, outliers, and the significant random component bring
non-stationarity and noisiness to the time series. Consequently, its forecasting presents a
considerable challenge. In such scenarios, it is frequently observed that different forecasting models
or methods demonstrate superior performance on particular segments of the series, while exhibiting
inferior performance on others. It’s rare for a single model or method to consistently outperform all
others across the entire series. This is precisely the situation where bagging methods, and
particularly our adaptive cascade bagging system, can prove useful. By combining the forecasts from
multiple models, the bagging approach aims to extract the best predictions from each individual
model, ultimately improving the overall forecasting accuracy and robustness.</p>
      <p>We employed q = 6 distinct and independent ensemble subsystems – computational intelligence
models of various structures and complexity [17] – to produce six corresponding forecasts
^y1 ( k ) , … , ^y6 ( k ) to be further fed into the corresponding submetamodels. For the purpose of this
study, we sorted the ensemble subsystems in terms of increasing complexity. Such a diversity is
aimed at capturing different properties of different parts of the series under consideration. Figure 3
shows the last 30 days of the original time series and the ensemble subsystems’ forecasts
^y1 ( k ) , … , ^y6 ( k ). We can see that long-term trends are more or less well captured by all subsystems,
but short-term changes pose a problem to all of them so that no single subsystem is significantly
better than the others for all data points.</p>
      <p>The corresponding forecasts demonstrate a decreasing trend of the forecasting errors presented in
Table 1. We used a set of error measures widely adopted in time series forecasting:
1. Mean Absolute Error (MAE);
2. Mean Absolute Scaled Error, scaled by a 1-step ahead naive forecast (MASE1);</p>
      <p>During the simulation, we applied the proposed adaptive cascade bagging system to generate six
forecasts ^y1* ( k ) , … , ^y*6 ( k ), where ^y1* ( k ) actually duplicates ^y1 ( k ) and ^y*2 ( k ) , … , ^y*6 ( k ) are produced
by the corresponding submetamodels. We treated the forecasting process as an online operation,
mirroring the real-time nature of electric load management. This means the entire dataset was
processed sequentially, sample by sample, without the traditional division into training, validation,
and test sets. This online processing approach reflects a core design principle of the adaptive cascade
bagging system – its ability to learn and adapt in real-time as new data arrives.
^y1* ( k ) (blue line), ^y*2 ( k ) (brown line), ^y*3 ( k ) (cyan line), ^y*4 ( k ) (red line), ^y5* ( k ) (green line), ^y*6 ( k )
(dotted black line).</p>
      <p>These observations demonstrate the effectiveness of the cascading approach and highlight the
significant efficiency gains it provides.</p>
      <p>Perhaps most importantly, these properties have direct implications for the system’s efficiency.
Consider a scenario where MAPE level of 4.5% is deemed acceptable for the task at hand. Our analysis
reveals that none of the six individual ensemble subsystems alone can achieve this level of accuracy.
However employing only the first four (simplest) ensemble subsystems within the proposed adaptive
cascade bagging system is sufficient to consistently achieve the desired accuracy of 4.5% or better.
This demonstrates a significant reduction in computational resources and complexity, as only a
fraction of the overall system (4 simplest out of 6 total ensemble subsystems) is required to meet the
performance target.</p>
      <p>This means that on each forecasting step we can use only a minimally sufficient number of the
simplest ensemble subsystems in the cascade system to achieve the desired result and skip
calculations of more complex ensemble subsystems, hence conserving the computational resources
which can be beneficial e.g. in embedded systems running on battery power. If the accuracy drops
below the desired level, additional ensemble subsystems and the corresponding submetamodels can
be switched on without retraining the preceding part of the adaptive cascade bagging system.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions</title>
      <p>To address the challenges of processing complex and dynamic signals, we introduced a novel
adaptive cascade bagging system. This system leverages the power of ensemble learning to achieve
optimized results while maintaining the flexibility of online tuning. The system’s design is rooted in
the cascade approach, where the outputs of multiple computational intelligence systems (e.g., neural
networks, support vector machines, fuzzy logic systems) are processed sequentially through a series
of simple submetamodels. The core advantage lies in its ability to dynamically adjust both the
weighting of individual ensemble subsystems and the number of ensemble subsystems themselves,
all while processing data in real-time. This is particularly crucial when dealing with disturbed
nonstationary signals – signals that are both noisy and whose characteristics change over time, making
traditional, offline approaches less effective. This online tuning capability is essential for handling
non-stationary signals where the optimal combination of ensemble subsystems may change as the
signal characteristics evolve. It ensures that the system adapts to changing signal characteristics and
maintains optimal performance over time, without requiring manual intervention or retraining.</p>
      <p>From a computational standpoint, the proposed system is remarkably simple. It is specifically
designed for online processing scenarios where data arrives at a sufficiently high rate. The cascade
structure, combined with efficient optimization algorithm, minimizes the computational overhead
required for processing each data point. This allows the system to operate in real-time, making it
suitable for applications such as anomaly detection in network traffic, predictive maintenance of
industrial equipment, or real-time financial trading.</p>
      <p>A detailed analysis of the simulation results demonstrates the effectiveness of the proposed
adaptive cascade bagging system, revealing a progressive reduction in output errors with each
subsequent submetamodel, consistently outperforming the ensemble subsystems. Notably, the 4th
submetamodel’s accuracy already surpasses that of the best individual ensemble subsystem, and the
final (6th) submetamodel achieves a 1.23-fold reduction in MAPE compared to the best individual
ensemble subsystem. This cascade architecture allows for significant efficiency gains; specifically,
achieving an acceptable MAPE level of 4.5% requires only the first four simplest ensemble
subsystems, a substantial reduction in computational resources and complexity compared to utilizing
the entire six-subsystems ensemble.</p>
      <p>Our further research will focus on generalizing the proposed architecture and the learning
algorithm to a multivariate case.</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <p>The author(s) have not employed any Generative AI tools.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>H.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Levinson</surname>
          </string-name>
          ,
          <article-title>The ensemble approach to forecasting: A review and synthesis</article-title>
          . Transportation Research Part C: Emerging Technologies,
          <volume>132</volume>
          (
          <year>2021</year>
          )
          <article-title>103357</article-title>
          . https://doi.org/10.1016/j.trc.
          <year>2021</year>
          .103357
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>N.</given-names>
            <surname>Rane</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.P.</given-names>
            <surname>Choudhary</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Rane</surname>
          </string-name>
          ,
          <article-title>Ensemble deep learning and machine learning: applications, opportunities, challenges, and future directions</article-title>
          .
          <source>Studies in Medical and Health Sciences</source>
          ,
          <volume>1</volume>
          (
          <issue>2</issue>
          ) (
          <year>2024</year>
          )
          <fpage>18</fpage>
          -
          <lpage>41</lpage>
          . http://dx.doi.org/10.2139/ssrn.4849885
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>T. N.</given-names>
            <surname>Rincy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <article-title>Ensemble learning techniques and its efficiency in machine learning: A survey, in: Proc. 2nd international conference on data, engineering and applications (IDEA)</article-title>
          , IEEE,
          <year>2020</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          . https://doi.org/10.1109/IDEA49133.
          <year>2020</year>
          .9170675
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Belayneh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Adamowski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Khalil</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Quilty,</surname>
          </string-name>
          <article-title>Coupling machine learning methods with wavelet transforms and the bootstrap and boosting ensemble approaches for drought prediction</article-title>
          .
          <source>Atmospheric research</source>
          ,
          <volume>172</volume>
          (
          <year>2016</year>
          )
          <fpage>37</fpage>
          -
          <lpage>47</lpage>
          . https://doi.org/10.1016/j.atmosres.
          <year>2015</year>
          .
          <volume>12</volume>
          .017
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>G.</given-names>
            <surname>Kunapuli</surname>
          </string-name>
          ,
          <article-title>Ensemble methods for machine learning</article-title>
          ,
          <source>Simon and Schuster</source>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>