=Paper= {{Paper |id=Vol-3777/short10 |storemode=property |title=Fuzzy Online Bagging Using Adaptive F-transform |pdfUrl=https://ceur-ws.org/Vol-3777/short10.pdf |volume=Vol-3777 |authors=Sergiy Popov,Iryna Pliss,Olexii Holovin,Oleh Zolotukhin,Larysa Chala |dblpUrl=https://dblp.org/rec/conf/profitai/PopovPHZC24 }} ==Fuzzy Online Bagging Using Adaptive F-transform== https://ceur-ws.org/Vol-3777/short10.pdf
                                Fuzzy Online Bagging Using Adaptive F-transform
                                Sergiy Popov1, Iryna Pliss1, Olexii Holovin2, Oleh Zolotukhin1 and Larysa Chala1
                                1
                                 Kharkiv National University of Radio Electronics, Nauky av., 14, Kharkiv, 61166, Ukraine
                                2
                                 Central Scientific Research Institute of Armament and Military Equipment of the Armed Forces of Ukraine, Kyiv, 03049,
                                Ukraine

                                                                    Abstract
                                                                    The ensemble multi-model bagging approach is considered. We propose a nonlinear adaptive bagging
                                                                    procedure which applies F-transform in its adaptive form to the results of a traditional weighting. This leads
                                                                    to further decrease of ensemble errors with low extra computational and time cost. Metamodel architecture
                                                                    and corresponding optimal learning algorithms are presented in details. Simulation based on short-term
                                                                    electric load forecasting problem confirms theoretical results and shows a significant decrease of the
                                                                    forecasting error in comparison to a linear approach.

                                                                    Keywords
                                                                    Nonlinear bagging, adaptive ensemble, optimal learning, F-transform 1


                                1. Introduction
                                Currently, Computational Intelligence (CI) systems such as Artificial Neural Networks (ANNs), both
                                deep (DNNs) and traditional – shallow (SNNs), Neuro-Fuzzy Systems (NFS), Neo-Fuzzy Systems, etc.
                                have been widely used to solve many Data Mining problems. This success is explained by their
                                universal approximating and extrapolating capabilities and the ability to learn, i.e. adjust their
                                parameters based on the data obtained from the observed object during its operation. At the same
                                time, quite often there is a problem of choosing a specific system or network that can best cope with
                                the problem being solved. Although DNNs provide high quality solutions to the problem, but require
                                very large volumes of training samples and a lot of time for their training. SNNs such as Radial Basis
                                Function Neural Networks (RBFNs) are inferior to DNNs in terms of accuracy, but are able to learn
                                online, i.e. solve Data Stream Mining problems. Neuro-Fuzzy and Neo-Fuzzy Systems can effectively
                                process non-stationary signals, etc. Therefore, choosing a specific system is a non-trivial task and
                                usually requires considerable experience of the researcher.
                                    To overcome the problems of choosing a specific system for a specific task, the ensemble multi-
                                model bagging approach [1-10] is quite often used, when the task is concurrently solved using an
                                ensemble of systems functioning in parallel. Their output signals are somehow combined using a so-
                                called metamodel which forms the optimal result. Usually, weighted averaging is used, where the
                                weights are calculated by the metamodel itself. As a rule, these are batch procedures working in
                                offline mode, although adaptive linear online approaches are known [4-6, 9, 10] for solving Data
                                Stream Mining problems. Non-linear bagging procedures practically do not exist with a few, but still
                                offline mode exceptions [2].
                                    Therefore, it is expedient to develop an adaptive nonlinear bagging metamodel that would
                                combine and generalize the ensemble members' processing results in online mode with high speed
                                and accuracy.




                                ProfIT AI 2024: 4th International Workshop of IT-professionals on Artificial Intelligence (ProfIT AI 2024), September 25–27,
                                2024, Cambridge, MA, USA
                                   serhii.popov@nure.ua (S. Popov); iryna.pliss@nure.ua (I. Pliss), a_a_golovin@ukr.net (O. Holovin);
                                oleg.zolotukhin@nure.ua (O. Zolotukhin), larysa.chala@nure.ua (L. Chala)
                                   0000-0002-1274-5830 (S. Popov); 0000-0001-7918-7362 (I. Pliss); 0000-0003-4662-4559 (O. Holovin); 0000-0002-0152-
                                7600 (O. Zolotukhin); 0000-0002-9890-4790 (L. Chala)
                                                               Β© 2024 Copyright for this paper by its authors.
                                                               Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                                    CEUR
                                    Wor
                                    Pr
                                       ks
                                        hop
                                     oceedi
                                          ngs
                                                ht
                                                I
                                                 tp:
                                                   //
                                                    ceur
                                                       -
                                                SSN1613-
                                                        ws
                                                         .or
                                                       0073
                                                           g
                                                               CEUR Workshop Proceedings (CEUR-WS.org)

CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
2. Architecture of the adaptive nonlinear bagging metamodel
The architecture of the adaptive nonlinear bagging metamodel is shown in Figure 1.

                                                                                           𝑦# βˆ— (π‘˜)
                                                                                 NS
            𝑦#! (π‘˜)       𝑀!βˆ—                                 πœ‡!       𝑀
                                                                       =!


            𝑦#( (π‘˜)       𝑀(βˆ—                                 πœ‡(       𝑀
                                                                       =(
                                        Ξ£                                       Ξ£          𝑦# * (π‘˜)




            𝑦## (π‘˜)       𝑀#βˆ—                                 πœ‡)       𝑀
                                                                       =)


Figure 1: Adaptive nonlinear bagging metamodel architecture

    It is readily seen that the metamodel’s architecture is similar to F. Rosenblatt's elementary
perceptron, but instead of a traditional activation function it contains a Nonlinear Synapse (NS)
which is the main building block of a neo-fuzzy neuron [11-13] and implements the F-transform [14]
in its adaptive version [15], i.e. it is essentially a universal approximator [16].
    Output signals from π‘ž ensemble members which are solving the same problem
                                                                                              $
𝑦#! (π‘˜), … , 𝑦#" (π‘˜), … , 𝑦## (π‘˜) (or in a vector form 𝑦#(π‘˜) = *𝑦#! (π‘˜), … , 𝑦#" (π‘˜), … , 𝑦## (π‘˜)+ , where π‘˜ =
1,2, . . . , 𝑁 is the current discrete time index) are fed to the metamodel’s inputs, then passed through
adjustable synaptic weights 𝑀!βˆ— , … , 𝑀"βˆ— , … , 𝑀#βˆ— , and finally combined in the adder forming
metamodel’s intermediate output signal 𝑦# βˆ— (π‘˜) in the form
                                                       #
                                            βˆ— (π‘˜)
                                       𝑦#           = 1 𝑀"βˆ— 𝑦#" (π‘˜)                                    (1)
                                                      "&!

or in a vector form

                                            𝑦# βˆ— (π‘˜) = 𝑦# $ (π‘˜)𝑀 βˆ— ,                                   (2)
                                   $
where 𝑀 βˆ— = 2𝑀!βˆ— , … , 𝑀"βˆ— , … , 𝑀#βˆ— 3 .
  The unbiasedness constraint is additionally imposed on the synaptic weights 𝑀 βˆ—
                                            #

                                        1 𝑀"βˆ— = 𝐼#$ 𝑀 βˆ— = 1
                                        "&!

(here 𝐼#$ is a (π‘ž Γ— 1) vector of ones). If we append inequality constraints on the non-negativity of
the synaptic weights 0 ≀ 𝑀"βˆ— ≀ 1, βˆ€π‘, these synaptic weights can be given the meaning of the
degrees of membership of each of the signals 𝑦#" (π‘˜) to the optimal result.
   Technically, the signal 𝑦# βˆ— (π‘˜) is already the solution of the optimization problem, however it can
be improved by processing it in an adaptive Nonlinear Synapse (NS). NS is formed by 𝑛 nonlinear
membership functions πœ‡' 2𝑦# βˆ— (π‘˜)3, 𝑙 = 1,2, … , 𝑛. Traditional triangular constructions that satisfy the
Ruspini partition of unity conditions are usually used, although it is possible to use more complex
variants, e.g. B-splines, Gaussians, Epanechnikov kernels, etc. Each membership function output
πœ‡' 2𝑦# βˆ— (π‘˜)3 is multiplied by the corresponding adjustable weight 𝑀
                                                                   =' , and then summed in the second
adder of the metamodel forming the output signal
                                                            )
                                                * (π‘˜)
                                           𝑦#              = ' πœ‡' 2𝑦# βˆ— (π‘˜)3,
                                                        = 1𝑀                                                 (3)
                                                            '&!

or in a vector form

                                                𝑦# * (π‘˜) = πœ‡$ 2𝑦# βˆ— (π‘˜)3𝑀
                                                                        =,                                   (4)
                                                                                $
where πœ‡2𝑦# βˆ— (π‘˜)3 = *πœ‡! 2𝑦# βˆ— (π‘˜)3, … , πœ‡' 2𝑦# βˆ— (π‘˜)3, … , πœ‡) 2𝑦# βˆ— (π‘˜)3+ , 𝑀
                                                                            = = (𝑀
                                                                                 =! , … , 𝑀        =) )$ .
                                                                                          =' , … , 𝑀
     Combining (1)-(4), finally we can write

                                                        )          #
                                         * (π‘˜)
                                    𝑦#              =' πœ‡' >1 𝑀"βˆ— 𝑦#" (π‘˜)?,
                                                 = 1𝑀
                                                      '&!         "&!


or

                                            𝑦# * (π‘˜) = πœ‡$ (𝑦# $ (π‘˜)𝑀 βˆ— )𝑀
                                                                        =,

Where adjusted synaptic weights vectors 𝑀 βˆ— and 𝑀
                                                = are subject to online learning.

3. Adaptive nonlinear bagging metamodel learning
Synaptic weights vector 𝑀 βˆ— can be adjusted by gradient optimization of the learning criterion

                                                  #                (
                         1                        1
                   𝐸(π‘˜) = >𝑦(π‘˜) βˆ’ 1 𝑀"βˆ— 𝑦#" (π‘˜)? = (𝑦(π‘˜) βˆ’ 𝑦# $ (π‘˜)𝑀 βˆ— )(
                         2                        2
                                                 "&!


(here 𝑦(π‘˜) – external reference signal) subject to constraints 𝐼#$ 𝑀 βˆ— = 1. It is achieved by searching
for the saddle point of the Lagrange function

                                    1
                        𝐿(𝑀 βˆ— , πœ†) = (𝑦(π‘˜) βˆ’ 𝑦# $ (π‘˜)𝑀 βˆ— )( + πœ†2𝐼#$ 𝑀 βˆ— βˆ’ 1 3,
                                    2
where πœ† – undefined Lagrange multiplier.
  The Arrow-Hurwiсz procedure can be used to find the saddle point in the following form

                                 𝑀 βˆ— (π‘˜) = 𝑀 βˆ— (π‘˜ βˆ’ 1) βˆ’ πœ‚+ (π‘˜)βˆ‡+ βˆ— 𝐿(𝑀 βˆ— , πœ†),
                             F
                                 πœ†(π‘˜) = πœ†(π‘˜ βˆ’ 1) + πœ‚, (π‘˜) πœ•πΏ(𝑀 βˆ— , πœ†)β„πœ• πœ†,

or

                      𝑀 βˆ— (π‘˜) = 𝑀 βˆ— (π‘˜ βˆ’ 1) + πœ‚+ (π‘˜)2𝑒(π‘˜)𝑦#(π‘˜) βˆ’ πœ†(π‘˜ βˆ’ 1)𝐼# 3,
                    K                                                                                        (5)
                      πœ†(π‘˜) = πœ†(π‘˜ βˆ’ 1) + πœ‚, (π‘˜)2𝐼#$ 𝑀 βˆ— (π‘˜) βˆ’ 13,

where πœ‚+ (π‘˜), πœ‚, (π‘˜) – learning step parameters, 𝑒(π‘˜) = 𝑦(π‘˜) βˆ’ 𝑦# $ (π‘˜)𝑀 βˆ— (π‘˜ βˆ’ 1) – learning error.
   It is mathematically proven that signal 𝑦# βˆ— (π‘˜) = 𝑦# $ (π‘˜)𝑀 βˆ— (π‘˜) in terms of accuracy is not inferior
to any of 𝑦#" (π‘˜), 𝑝 = 1,2, … , π‘ž at the metamodel input.
   The learning process can be optimized in terms of speed by the appropriate selection of the
learning step parameter πœ‚+ (π‘˜). If the following option is chosen

                                                       𝑒(π‘˜)
                           πœ‚+ (π‘˜) =                                       ,
                                      𝑒(π‘˜)‖𝑦#(π‘˜)β€–( βˆ’ πœ†(π‘˜ βˆ’ 1)𝐼#$ 𝑦#(π‘˜)

the learning algorithm (5) can be written in the form

                                            𝑒(π‘˜)2𝑒(π‘˜)𝑦#(π‘˜) βˆ’ πœ†(π‘˜ βˆ’ 1)𝐼# 3
                   𝑀 βˆ— (π‘˜) = 𝑀 βˆ— (π‘˜ βˆ’ 1) +                                  ,
                  N                        𝑒(π‘˜)‖𝑦#(π‘˜)β€–( βˆ’ πœ†(π‘˜ βˆ’ 1)𝐼#$ 𝑦#(π‘˜)                    (6)
                   πœ†(π‘˜) = πœ†(π‘˜ βˆ’ 1) + πœ‚, (π‘˜)2𝐼#$ 𝑀 βˆ— (π‘˜) βˆ’ 13.

When πœ†(π‘˜) = 0, it completely coincides with the adaptive speed-optimal Kaczmarz-Widrow-Hoff
algorithm [17-20].
    As noted above, the synaptic weights at the inputs of the metamodel can be given the meaning
of the degrees of membership of each of the input signals 𝑦#" (π‘˜) to the optimal signal, which should
theoretically coincide with the reference signal 𝑦(π‘˜). In this case, the learning task consists in the
minimization of the criterion

                                        #               (
                      1           -           1
                𝐸(π‘˜) = >𝑦(π‘˜) βˆ’ 1 𝑀" 𝑦#" (π‘˜)? = (𝑦(π‘˜) βˆ’ 𝑦# $ (π‘˜)𝑀 - )(
                      2                       2
                                       "&!


subject to constraints
                                                            -
                                 𝐼#$ 𝑀 - = 1,     0 ≀ 𝑀" ≀ 1, βˆ€π‘.

   Introducing the Lagrange function

                              1
               𝐿(𝑀 - , πœ†, 𝜌) = (𝑦(π‘˜) βˆ’ 𝑦# $ (π‘˜)𝑀 - )( + πœ†2𝐼#$ 𝑀 - βˆ’ 1 3 βˆ’ 𝜌$ 𝑀 - ,
                              2
(here 𝜌 – vector of non-negative indefinite Lagrange multipliers) and the Kuhn-Tucker system

                                       βˆ‡+ " 𝐿(𝑀 - , πœ†, 𝜌) = 0# ,
                                      Pπœ•πΏ(𝑀 - , πœ†, 𝜌)β„πœ• πœ† = 0,
                                       𝜌" β‰₯ 0, 𝑝 = 1,2, … , π‘ž,

it is easy to write the Arrow-Hurwicz-Uzawa gradient procedure for finding the saddle point of the
Lagrangian in the form

                          𝑀 - (π‘˜) = 𝑀 - (π‘˜ βˆ’ 1) βˆ’ πœ‚+ (π‘˜)βˆ‡+ " 𝐿(𝑀 - , πœ†, 𝜌),
                                                         -
                         Pπœ†(π‘˜) = πœ†(π‘˜ βˆ’ 1) + πœ‚, (π‘˜) πœ•πΏ(𝑀 , πœ†, 𝜌)β„πœ• πœ†,
                          𝜌(π‘˜) = R𝜌(π‘˜ βˆ’ 1) βˆ’ πœ‚. (π‘˜)βˆ‡. 𝐿(𝑀 - , πœ†, 𝜌)S
                                                                      /

(here [βˆ™]/ is the projector on the positive orthant),
or

              -       -
           βŽ§π‘€ (π‘˜) = 𝑀 (π‘˜ βˆ’ 1) + πœ‚+ (π‘˜) *𝑒(π‘˜)𝑦#(π‘˜) βˆ’ πœ†(π‘˜ βˆ’ 1)𝐼# βˆ’ 𝜌(π‘˜ βˆ’ 1)+ ,
            πœ†(π‘˜) = πœ†(π‘˜ βˆ’ 1) + πœ‚, (π‘˜)2𝐼#$ 𝑀 - (π‘˜) βˆ’ 13,                                         (7)
           ⎨                            -
           ⎩𝜌(π‘˜) = R𝜌(π‘˜ βˆ’ 1) βˆ’ πœ‚. (π‘˜)𝑀 (π‘˜)S .      /
   Similarly to (5) and (6), the learning algorithm (7) can also be optimized for speed. The optimized
procedure has the following final form

        ⎧ -                     𝑒(π‘˜ ) *𝑒(π‘˜)𝑦#(π‘˜) βˆ’ πœ†(π‘˜ βˆ’ 1)𝐼# βˆ’ 𝜌(π‘˜ βˆ’ 1)+
                  -
        βŽͺ𝑀 (π‘˜) = 𝑀 (π‘˜ βˆ’ 1) + 𝑒(π‘˜)‖𝑦#(π‘˜)β€–( βˆ’ πœ†(π‘˜ βˆ’ 1)𝐼 $ 𝑦#(π‘˜) + 𝜌$ (π‘˜ βˆ’ 1)𝑦#(π‘˜) ,
                                                                    #
                                                                                                 (8)
        βŽ¨πœ†(π‘˜) = πœ†(π‘˜ βˆ’ 1) + πœ‚, (π‘˜)2𝐼#$ 𝑀 - (π‘˜) βˆ’ 13,
        βŽͺ                            -
        ⎩𝜌(π‘˜) = R𝜌(π‘˜ βˆ’ 1) βˆ’ πœ‚. (π‘˜)𝑀 (π‘˜)S/ .

   Thus, algorithms (6), (8) are designed for online adjustment of metamodel parameters 𝑀 βˆ— or 𝑀 -
and ensure high accuracy of the obtained results.
   As already mentioned, triangular-shaped functions are usually used as membership functions in
the nonlinear synapse NS:

                                      𝑦# βˆ— (π‘˜) βˆ’ 𝑐'0!
                                     ⎧                , if 𝑦# βˆ— (π‘˜) ∈ [𝑐'0! , 𝑐' ],
                                     βŽͺ '  𝑐  βˆ’ 𝑐'0!
                      πœ‡' 2𝑦# βˆ— (π‘˜)3 = 𝑐'/! βˆ’ 𝑦# βˆ— (π‘˜)
                                     ⎨                , if 𝑦# βˆ— (π‘˜) ∈ [𝑐' , 𝑐'/! ],
                                     βŽͺ 𝑐'/! βˆ’ 𝑐'
                                     ⎩0 otherwise,

where 𝑐'0! , 𝑐' , 𝑐'/! – parameters of the centers of adjacent membership functions which are usually
either uniformly distributed over the abscissa axis or can be found using clustering procedures [20,
21].
   The main advantage of such functions is that at each learning cycle only two adjacent functions
are activated and accordingly only two synaptic weights 𝑀      ='0! , 𝑀
                                                                      =' or 𝑀
                                                                            =' , 𝑀
                                                                                 ='/! are adjusted which
simplifies and speeds up nonlinear synapse tuning process.
   A standard quadratic criterion can be used to tune the nonlinear synapse:
                                   )                    (
                    1                             1                       (
            𝐸f (π‘˜) = g𝑦(π‘˜) βˆ’ 1 𝑀
                               =' πœ‡' 2𝑦# βˆ— (π‘˜)3h = 2𝑦(π‘˜) βˆ’ πœ‡$ 2𝑦# βˆ— (π‘˜)3𝑀
                                                                        =3 ,
                    2                             2
                                  '&!

which is minimized using a gradient procedure

                    =(π‘˜ βˆ’ 1) βˆ’ πœ‚+1 (π‘˜) *𝑦(π‘˜) βˆ’ πœ‡$ 2𝑦# βˆ— (π‘˜)3𝑀
            = (π‘˜) = 𝑀
            𝑀                                               =(π‘˜ βˆ’ 1)+ πœ‡2𝑦# βˆ— (π‘˜)3,               (9)

where the step parameter πœ‚+1 (π‘˜) is chosen either using the Kaczmarz-Widrow-Hoff procedure [17,
19] or using other approaches [22, 23] which provide additional filtering properties of the learning
process.

4. Simulation results
As a test case, we apply the proposed bagging approach to the short-term electric load forecasting
problem (STLF), specifically 1-step ahead forecasting of daily electric load of one of regional power
systems of Ukraine. We have the original series with 𝑁 = 337 samples and π‘ž = 6 forecast series
(337 samples each) generated by 6 different independent computational intelligence models. We treat
the time series as a data stream, i.e. forecasting and metamodel operation is performed in online
mode, therefore the whole dataset is processed only once (sample by sample, π‘˜ = 1,2, . . . , 𝑁) and
there is no need to divide it into training, validation and test sets.
   The original series (Figure 2) has several trends corresponding to different seasons, periodic
(mostly weekly) patterns, sudden changes and outliers. Obviously, there is a strong random
component, because electric load in large systems depends on many external factors, some of which
have true random or chaotic nature, e.g. weather conditions [24]. So, the time series is nonstationary
and noisy by its nature, hence its forecasting is quite challenging and usually different forecasting
models/methods perform better than others on particular parts of the series and are inferior on other
parts. One model/method is rarely better than all others on the whole series. It is exactly the case
when bagging methods come into play and can improve overall forecasting accuracy attempting to
take the best from all models/methods in the ensemble.




Figure 2: Daily electric load time series

   We employ 6 specialized STLF models in the ensemble that have different inputs and structures.
Such a diversity is aimed at capturing different properties of different parts of the series under
consideration. Figure 3 shows last 30 days of the time series with the corresponding forecasts. We
can see that long-term trends are more or less well captured by all models, but short-term changes
pose a problem to all of them so that no single model is significantly better than the others.




Figure 3: Forecasting results: true electric load (solid black line), 6 independent 1-day ahead forecasts
(color lines), two metamodel forecasts: 𝑦# βˆ— (π‘˜) (dotted black line), 𝑦# * (π‘˜) (dashed black line).

      We apply model (2) with algorithm (6) to obtain an optimal linear combination 𝑦# βˆ— (π‘˜) of the 6
forecasts from the ensemble member models. Just by a visual inspection of the plot it is obvious that
𝑦# βˆ— (π‘˜) is generally closer to the true series 𝑦(π‘˜), which is also confirmed by corresponding errors
comparison in Table 1. We employ Mean Absolute Percentage Error (MAPE) criterion that is widely
used in short-term electric load forecasting research and has a clear physical sense.
Table 1
1-day ahead forecasting errors for all models and the ensemble outputs
 Models      #1         #2          #3         #4         #5       #6              𝑦# βˆ— (π‘˜)    𝑦# * (π‘˜)
 MAPE     6.8171% 7.5066% 4.8580% 5.0311% 4.8827% 5.1015%                         4.1440%     3.9595%

    The best of ensemble member models provides MAPE of 4.858%, which is reduced to 4.144% by
the linear bagging procedure. Then we additionally apply to 𝑦# βˆ— (π‘˜) the adaptive F-transform (4) in
order to exploit any possible remaining nonlinearities which cannot be approximated by the linear
model (2). In this simple test case, the nonlinear synapse has 10 triangular membership functions
πœ‡' 2𝑦# βˆ— (π‘˜)3 whose centers 𝑐' are uniformly distributed between the minimum and maximum values
of the time series 𝑦(π‘˜). NS parameters 𝑀  =(π‘˜) are tuned by procedure (9). This additional nonlinear
processing step further reduces the bagging error to 3.9595%, which is 1.23 times less than the lowest
error provided by the best ensemble member alone.
    The aforementioned processing steps can be summarized as a pseudo-code below.

Algorithm 1
Adaptive nonlinear bagging procedure performed on each time step π’Œ
 Step 1. Receive input signals from π‘ž ensemble members 𝑦#! (π‘˜), … , 𝑦## (π‘˜).
 Step 2. Calculate the intermediate output signal 𝑦# βˆ— (π‘˜) as a linear combination of inputs (2).
 Step 3. Apply adaptive F-transform (4) to obtain the output signal 𝑦# * (π‘˜).
 Step 4. Update weights 𝑀 βˆ— (π‘˜) using learning algorithm (6).
 Step 5. Update NS parameters w (Μƒ k) with procedure (9).

5. Conclusions
A fuzzy nonlinear online bagging procedure is proposed. It provides optimal results of the ensemble
of computational intelligence systems for solving Data Stream Mining problems when the data are
received for processing in real time and are non-stationary in nature. The proposed approach has a
simple numerical implementation and high processing rate.
   Simulations confirm theoretical results. Optimal linear combination provides errors lower than
the lowest error among the ensemble member models. Nonlinear F-transform provides additional
decrease of the error, overall by 1.23 times in comparison to the best model in the ensemble.
   Future research on the topic would focus on fine tuning of the nonlinear synapse parameters
(membership function types, their centers initialization and adaptation, etc.) and including other
types of nonlinearities in the metamodel.

References
[1] L.     Breiman,      Bagging     predictors,   Machine       Learning    24    (1996)    126–140.
    https://doi.org/10.1007/BF00058655.
[2] J. H. Friedman, P. Hall, On bagging and nonlinear estimation, Journal of Statistical Planning and
    Inference 137.3 (2007) 669–683. https://doi.org/10.1016/j.jspi.2006.06.002.
[3] Z.-H. Zhou, J. Wu, W. Tang, Ensembling neural networks: Many could be better than all,
    Artificial Intelligence 137.1-2 (2002) 239–263. https://doi.org/10.1016/S0004-3702(02)00190-X.
[4] Ye. Bodyanskiy, P. Otto, I. Pliss, S. Popov, An Optimal Algorithm for Combining Multivariate
    Forecasts in Hybrid Systems, in: V. Palade, R.J. Howlett, L. Jain (Eds) Knowledge-Based
    Intelligent Information and Engineering Systems. KES 2003, volume 2774 of Lecture Notes in
    Computer Science, Springer, Berlin, Heidelberg, 2003. https://doi.org/10.1007/978-3-540-45226-
    3_132.
[5] Ye. Bodyanskiy, S. Popov, Fuzzy Selection Mechanism for Multimodel Prediction. in M.G.
    Negoita, R.J. Howlett, L.C. Jain (Eds), Knowledge-Based Intelligent Information and Engineering
     Systems. KES 2004, volume 3214 of Lecture Notes in Computer Science, Springer, Berlin,
     Heidelberg, 2004, pp. 772–778. https://doi.org/10.1007/978-3-540-30133-2_101.
[6] A. Bifet, G. Holmes, B. Pfahringer, R. GavaldΓ , Improving Adaptive Bagging Methods for
     Evolving Data Streams, in: ZH. Zhou, T. Washio (Eds), Advances in Machine Learning. ACML,
     volume 5828 of Lecture Notes in Computer Science, Springer, Berlin, Heidelberg, 2009, pp. 23–
     37. https://doi.org/10.1007/978-3-642-05224-8_4.
[7] C. Zhang, Y. Ma (Eds) Ensemble Machine Learning. Methods and Applications, Springer New
     York, NY, 2012. https://doi.org/10.1007/978-1-4419-9326-7.
[8] D. Sarkar, V. Natarajan, Ensemble Machine Learning Cookbook, Packt Publishing, Birmingham,
     2019.
[9] N. C. Oza, Online bagging and boosting, in Proc. 2005 IEEE International Conference on
     Systems, Man and Cybernetics, Waikoloa, HI, USA, 2005, Vol. 3, pp. 2340-2345.
     https://doi.org/10.1109/ICSMC.2005.1571498.
[10] E. Lughofer, M. Pratama, I. Skrjanc, Online bagging of evolving fuzzy systems, Information
     Sciences 570 (2021) 16–33. https://doi.org/10.1016/j.ins.2021.04.041.
[11] T. Yamakawa, E. Uchino, T. Miki, H. Kusanagi, A neo fuzzy neuron and its applications to system
     identification and prediction of the system behavior, in Proc. 2nd Int. Conf. on Fuzzy Logic and
     Neural Networks, 1992, pp. 477-483.
[12] E. Uchino, T. Yamakava, Neo-fuzzy neuron based new approach to system modeling with
     application to actual system, in Proc. Sixth International Conference on Tools with Artificial
     Intelligence, New Orlean, LA, USA, 1994, pp.564-570.
[13] T. Miki, T. Yamakawa, Analog implementation of neo-fuzzy neuron and its on-board learning,
     in Computational Intelligence and Applications, WSES Press, Piraeus, 1999, pp. 144-149.
[14] I. Perfilieva, Fuzzy transforms: Theory and applications, Fuzzy Sets and Systems 157.8 (2006)
     993-1023. https://doi.org/10.1016/j.fss.2005.11.012.
[15] Ye. Bodyanskiy, N. Teslenko, Adaptyvne F-peretvorennya na isnovi uzagal’nenoi regresiynoi
     neyronnoi merezhi, Adaptyvni systemy avtomatychnogo upravlinnya 10 (2007) 25–31. (In
     Ukrainian)
[16] Ye. Bodyanskiy, S. Kostiuk Neuron based on an adaptive fuzzy transform for modern artificial
     neural network models, International Scientific Technical Journal "Problems of Control and
     Informatics" 68.6 (2023) 94–105. https://doi.org/10.34229/1028-0979-2023-6-7.
[17] S. Kaczmarz, Angenaherte auflosung von systemen linearer gleichungen, Bulletin International
     de l’Academie Polonaise des Sciences et des Lettres 35 (1937) 355–357. (In German)
[18] S. Kaczmarz, Approximate solution of systems of linear equations, International Journal of
     Control 57.6 (1993) 1269–1271. https://doi.org/10.1080/00207179308934446.
[19] B. Widrow, M.E. Hoff, Adaptive switching circuits, in: 1960 IRE WESCON Convention Record,
     Part 4, IRE, New York,1960, pp. 96–104.
[20] T. Kohonen, Self-Organizing Maps, Springer-verlag, Berlin, 1995.
[21] Ye. Bodyanskiy, E. Vynokurova, A. Dolotov, Self-Learning Cascade Spiking Neural Network for
     Fuzzy Clustering Based on Group Method of Data Handling, Journal of Automation and
     Information Sciences 45.3 (2013) 23–33. https://doi.org/10.1615/JAutomatInfScien.v45.i3.30.
[22] P. Otto, Ye. Bodyanskiy, V. Kolodyazhniy, A new learning algorithm for a forecasting neuro-
     fuzzy network, Integrated Computer Aided Engineering 10 (2003) 399-409.
     https://doi.org/10.3233/ICA-2003-10409.
[23] Ye. Bodyanskiy, O. Vynokurova, I. Pliss, G. Setlak, P. Mulesa, Fast learning algorithm for deep
     evolving GMDH-SVM neural network in data stream mining tasks, in Proc. 2016 IEEE First
     International Conference on Data Stream Mining & Processing (DSMP), Lviv, Ukraine, 2016, pp.
     257–262. https://doi.org//10.1109/DSMP.2016.7583555.
[24] Ye. Bodyanskiy, S. Popov, T. Rybalchenko, Feedforward neural network with a specialized
     architecture for estimation of the temperature influence on the electric load, in Proc. 2008 4th
     International IEEE Conference Intelligent Systems, Varna, Bulgaria, 2008, pp. 7-14–7-18.
     https://doi.org//10.1109/IS.2008.4670444.