<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>International Scientific Technical Journal "Problems of Control and
Informatics" 68.6 (2023) 94-105. https://doi.org/10.34229/1028</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Fuzzy Online Bagging Using Adaptive F-transform</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sergiy Popov</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Iryna Pliss</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Olexii Holovin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Oleh Zolotukhin</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Larysa Chala</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Central Scientific Research Institute of Armament and Military Equipment of the Armed Forces of Ukraine</institution>
          ,
          <addr-line>Kyiv, 03049</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Kharkiv National University of Radio Electronics</institution>
          ,
          <addr-line>Nauky av., 14, Kharkiv, 61166</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>3214</volume>
      <fpage>25</fpage>
      <lpage>27</lpage>
      <abstract>
        <p>The ensemble multi-model bagging approach is considered. We propose a nonlinear adaptive bagging procedure which applies F-transform in its adaptive form to the results of a traditional weighting. This leads to further decrease of ensemble errors with low extra computational and time cost. Metamodel architecture and corresponding optimal learning algorithms are presented in details. Simulation based on short-term electric load forecasting problem confirms theoretical results and shows a significant decrease of the forecasting error in comparison to a linear approach.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Nonlinear bagging</kwd>
        <kwd>adaptive ensemble</kwd>
        <kwd>optimal learning</kwd>
        <kwd>F-transform 1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>2. Architecture of the adaptive nonlinear bagging metamodel</title>
      <p>The architecture of the adaptive nonlinear bagging metamodel is shown in Figure 1.
#!()
#(()
##()
!∗
(∗
#∗</p>
      <p>It is readily seen that the metamodel’s architecture is similar to F. Rosenblatt's elementary
perceptron, but instead of a traditional activation function it contains a Nonlinear Synapse (NS)
which is the main building block of a neo-fuzzy neuron [11-13] and implements the F-transform [14]
in its adaptive version [15], i.e. it is essentially a universal approximator [16].</p>
      <p>Output signals from  ensemble members which are solving the same problem
$
#!(), … , #"(), … , ##() (or in a vector form #() = *#!(), … , #"(), … , ##()+ , where  =
1,2, . . . ,  is the current discrete time index) are fed to the metamodel’s inputs, then passed through
adjustable synaptic weights !∗, … , "∗, … , #∗, and finally combined in the adder forming
metamodel’s intermediate output signal #∗() in the form
or in a vector form
where ∗ = 2!∗, … , "∗, … , #∗3$.</p>
      <p>
        The unbiasedness constraint is additionally imposed on the synaptic weights ∗
(
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
(
        <xref ref-type="bibr" rid="ref2">2</xref>
        )
#
#∗() = 1 "∗#"()
      </p>
      <p>"&amp;!
#∗() = #$()∗,
#
1 "∗ = #$∗ = 1
"&amp;!
(here #$ is a ( × 1) vector of ones). If we append inequality constraints on the non-negativity of
the synaptic weights 0 ≤ "∗ ≤ 1, ∀, these synaptic weights can be given the meaning of the
degrees of membership of each of the signals #"() to the optimal result.</p>
      <p>Technically, the signal #∗() is already the solution of the optimization problem, however it can
be improved by processing it in an adaptive Nonlinear Synapse (NS). NS is formed by  nonlinear
membership functions '2#∗()3,  = 1,2, … , . Traditional triangular constructions that satisfy the
Ruspini partition of unity conditions are usually used, although it is possible to use more complex
$
where 2#∗()3 = *!2#∗()3, … , '2#∗()3, … , )2#∗()3+ , = = (=!, … , =', … , =))$.</p>
      <p>
        Combining (
        <xref ref-type="bibr" rid="ref1">1</xref>
        )-(
        <xref ref-type="bibr" rid="ref4">4</xref>
        ), finally we can write
)
      </p>
      <p>#
#*() = 1
'&amp;!
='' &gt;1
"&amp;!</p>
      <p>"∗#"()?,
#*() = $(#$()∗)=,
Where adjusted synaptic weights vectors ∗ and = are subject to online learning.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Adaptive nonlinear bagging metamodel learning</title>
      <p>
        Synaptic weights vector ∗ can be adjusted by gradient optimization of the learning criterion
(
        <xref ref-type="bibr" rid="ref4">4</xref>
        )
(
        <xref ref-type="bibr" rid="ref5">5</xref>
        )
variants, e.g. B-splines, Gaussians, Epanechnikov kernels, etc. Each membership function output
'2#∗()3 is multiplied by the corresponding adjustable weight =', and then summed in the second
adder of the metamodel forming the output signal
(here () – external reference signal) subject to constraints #$∗ = 1. It is achieved by searching
for the saddle point of the Lagrange function
where  – undefined Lagrange multiplier.
      </p>
      <p>The Arrow-Hurwiсz procedure can be used to find the saddle point in the following form
∗() = ∗( − 1) − +()∇+∗(∗, ),</p>
      <p>F() = ( − 1) + ,() (∗, )⁄ ,
K
∗() = ∗( − 1) + +()2()#() − ( − 1)#3,
() = ( − 1) + ,()2#$∗() − 13,
where +(), ,() – learning step parameters, () = () − #$()∗( − 1) – learning error.</p>
      <p>It is mathematically proven that signal #∗() = #$()∗() in terms of accuracy is not inferior
to any of #"(),  = 1,2, … ,  at the metamodel input.</p>
      <p>
        The learning process can be optimized in terms of speed by the appropriate selection of the
learning step parameter +(). If the following option is chosen
the learning algorithm (
        <xref ref-type="bibr" rid="ref5">5</xref>
        ) can be written in the form
+() =
      </p>
      <p>,
N
∗() = ∗( − 1) +
()2()#() − ( − 1)#3
()‖#()‖( − ( − 1)#$#()
,
() = ( − 1) + ,()2#$∗() − 13.</p>
      <p>When () = 0, it completely coincides with the adaptive speed-optimal Kaczmarz-Widrow-Hoff
algorithm [17-20].</p>
      <p>As noted above, the synaptic weights at the inputs of the metamodel can be given the meaning
of the degrees of membership of each of the input signals #"() to the optimal signal, which should
theoretically coincide with the reference signal (). In this case, the learning task consists in the
minimization of the criterion
(6)
(7)
(here  – vector of non-negative indefinite Lagrange multipliers) and the Kuhn-Tucker system
it is easy to write the Arrow-Hurwicz-Uzawa gradient procedure for finding the saddle point of the
Lagrangian in the form</p>
      <p>∇+"(-, , ) = 0#,
P(-, , )⁄  = 0,</p>
      <p>" ≥ 0,  = 1,2, … , ,
-() = -( − 1) − +()∇+"(-, , ),
P() = ( − 1) + ,() (-, , )⁄ ,</p>
      <p>() = R( − 1) − .()∇.(-, , )S/
(here [∙]/ is the projector on the positive orthant),
or
⎧-() = -( − 1) + +() *()#() − ( − 1)# − ( − 1)+,</p>
      <p>() = ( − 1) + ,()2#$-() − 13,
⎨
⎩() = R( − 1) − .()-()S/.</p>
      <p>
        Similarly to (
        <xref ref-type="bibr" rid="ref5">5</xref>
        ) and (6), the learning algorithm (7) can also be optimized for speed. The optimized
procedure has the following final form
⎪⎧-() = -( − 1) +
      </p>
      <p>() *()#() − ( − 1)# − ( − 1)+
()‖#()‖( − ( − 1)#$#() + $( − 1)#()
,
⎨() = ( − 1) + ,()2#$-() − 13,
⎪
⎩() = R( − 1) − .()-()S/.
(8)</p>
      <p>Thus, algorithms (6), (8) are designed for online adjustment of metamodel parameters ∗ or
and ensure high accuracy of the obtained results.</p>
      <p>As already mentioned, triangular-shaped functions are usually used as membership functions in
the nonlinear synapse NS:
⎧#∗() − '0! , if #∗() ∈ ['0!, '],
⎪ ' − '0!
'2#∗()3 = '/! − #∗()
⎨ , if #∗() ∈ [', '/!],
⎪ '/! − '
⎩0 otherwise,
where '0!, ', '/! – parameters of the centers of adjacent membership functions which are usually
either uniformly distributed over the abscissa axis or can be found using clustering procedures [20,
21].</p>
      <p>The main advantage of such functions is that at each learning cycle only two adjacent functions
are activated and accordingly only two synaptic weights ='0!, =' or =', ='/! are adjusted which
simplifies and speeds up nonlinear synapse tuning process.</p>
      <p>A standard quadratic criterion can be used to tune the nonlinear synapse:
f() =
which is minimized using a gradient procedure
=() = =( − 1) − +1 () *() − $2#∗()3=( − 1)+2#∗()3,
(9)
where the step parameter +1 () is chosen either using the Kaczmarz-Widrow-Hoff procedure [17,
19] or using other approaches [22, 23] which provide additional filtering properties of the learning
process.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Simulation results</title>
      <p>As a test case, we apply the proposed bagging approach to the short-term electric load forecasting
problem (STLF), specifically 1-step ahead forecasting of daily electric load of one of regional power
systems of Ukraine. We have the original series with  = 337 samples and  = 6 forecast series
(337 samples each) generated by 6 different independent computational intelligence models. We treat
the time series as a data stream, i.e. forecasting and metamodel operation is performed in online
mode, therefore the whole dataset is processed only once (sample by sample,  = 1,2, . . . , ) and
there is no need to divide it into training, validation and test sets.</p>
      <p>The original series (Figure 2) has several trends corresponding to different seasons, periodic
(mostly weekly) patterns, sudden changes and outliers. Obviously, there is a strong random
component, because electric load in large systems depends on many external factors, some of which
have true random or chaotic nature, e.g. weather conditions [24]. So, the time series is nonstationary
and noisy by its nature, hence its forecasting is quite challenging and usually different forecasting
models/methods perform better than others on particular parts of the series and are inferior on other
parts. One model/method is rarely better than all others on the whole series. It is exactly the case
when bagging methods come into play and can improve overall forecasting accuracy attempting to
take the best from all models/methods in the ensemble.</p>
      <p>We employ 6 specialized STLF models in the ensemble that have different inputs and structures.
Such a diversity is aimed at capturing different properties of different parts of the series under
consideration. Figure 3 shows last 30 days of the time series with the corresponding forecasts. We
can see that long-term trends are more or less well captured by all models, but short-term changes
pose a problem to all of them so that no single model is significantly better than the others.</p>
      <p>
        We apply model (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) with algorithm (6) to obtain an optimal linear combination #∗() of the 6
forecasts from the ensemble member models. Just by a visual inspection of the plot it is obvious that
#∗() is generally closer to the true series (), which is also confirmed by corresponding errors
comparison in Table 1. We employ Mean Absolute Percentage Error (MAPE) criterion that is widely
used in short-term electric load forecasting research and has a clear physical sense.
      </p>
      <p>
        The best of ensemble member models provides MAPE of 4.858%, which is reduced to 4.144% by
the linear bagging procedure. Then we additionally apply to #∗() the adaptive F-transform (
        <xref ref-type="bibr" rid="ref4">4</xref>
        ) in
order to exploit any possible remaining nonlinearities which cannot be approximated by the linear
model (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ). In this simple test case, the nonlinear synapse has 10 triangular membership functions
'2#∗()3 whose centers ' are uniformly distributed between the minimum and maximum values
of the time series (). NS parameters =() are tuned by procedure (9). This additional nonlinear
processing step further reduces the bagging error to 3.9595%, which is 1.23 times less than the lowest
error provided by the best ensemble member alone.
      </p>
      <p>The aforementioned processing steps can be summarized as a pseudo-code below.
Algorithm 1
Adaptive nonlinear bagging procedure performed on each time step</p>
      <p>Step 1. Receive input signals from  ensemble members #!(), … , ##().</p>
      <p>
        Step 2. Calculate the intermediate output signal #∗() as a linear combination of inputs (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ).
Step 3. Apply adaptive F-transform (
        <xref ref-type="bibr" rid="ref4">4</xref>
        ) to obtain the output signal #*().
      </p>
      <p>Step 4. Update weights ∗() using learning algorithm (6).</p>
      <p>Step 5. Update NS parameters w ̃(k) with procedure (9).</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions</title>
      <p>A fuzzy nonlinear online bagging procedure is proposed. It provides optimal results of the ensemble
of computational intelligence systems for solving Data Stream Mining problems when the data are
received for processing in real time and are non-stationary in nature. The proposed approach has a
simple numerical implementation and high processing rate.</p>
      <p>Simulations confirm theoretical results. Optimal linear combination provides errors lower than
the lowest error among the ensemble member models. Nonlinear F-transform provides additional
decrease of the error, overall by 1.23 times in comparison to the best model in the ensemble.</p>
      <p>Future research on the topic would focus on fine tuning of the nonlinear synapse parameters
(membership function types, their centers initialization and adaptation, etc.) and including other
types of nonlinearities in the metamodel.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>L.</given-names>
            <surname>Breiman</surname>
          </string-name>
          , Bagging predictors,
          <source>Machine Learning</source>
          <volume>24</volume>
          (
          <year>1996</year>
          )
          <fpage>126</fpage>
          -
          <lpage>140</lpage>
          . https://doi.org/10.1007/BF00058655.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J. H.</given-names>
            <surname>Friedman</surname>
          </string-name>
          , P. Hall,
          <article-title>On bagging and nonlinear estimation</article-title>
          ,
          <source>Journal of Statistical Planning and Inference 137.3</source>
          (
          <year>2007</year>
          )
          <fpage>669</fpage>
          -
          <lpage>683</lpage>
          . https://doi.org/10.1016/j.jspi.
          <year>2006</year>
          .
          <volume>06</volume>
          .002.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Z.-H.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <article-title>Ensembling neural networks: Many could be better than all</article-title>
          ,
          <source>Artificial Intelligence 137</source>
          .
          <fpage>1</fpage>
          -
          <lpage>2</lpage>
          (
          <year>2002</year>
          )
          <fpage>239</fpage>
          -
          <lpage>263</lpage>
          . https://doi.org/10.1016/S0004-
          <volume>3702</volume>
          (
          <issue>02</issue>
          )
          <fpage>00190</fpage>
          -
          <lpage>X</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Ye</surname>
            . Bodyanskiy,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Otto</surname>
            , I. Pliss,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Popov</surname>
          </string-name>
          ,
          <article-title>An Optimal Algorithm for Combining Multivariate Forecasts in Hybrid Systems</article-title>
          , in: V.
          <string-name>
            <surname>Palade</surname>
            ,
            <given-names>R.J.</given-names>
          </string-name>
          <string-name>
            <surname>Howlett</surname>
          </string-name>
          , L. Jain (Eds)
          <article-title>Knowledge-Based Intelligent Information and Engineering Systems</article-title>
          .
          <source>KES</source>
          <year>2003</year>
          , volume
          <volume>2774</volume>
          of Lecture Notes in Computer Science, Springer, Berlin, Heidelberg,
          <year>2003</year>
          . https://doi.org/10.1007/978-3-
          <fpage>540</fpage>
          -45226- 3_
          <fpage>132</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Ye. Bodyanskiy</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Popov</surname>
          </string-name>
          ,
          <article-title>Fuzzy Selection Mechanism for Multimodel Prediction</article-title>
          . in M.G. Negoita,
          <string-name>
            <given-names>R.J.</given-names>
            <surname>Howlett</surname>
          </string-name>
          , L.C. Jain (Eds),
          <source>Knowledge-Based Intelligent Information and Engineering</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>