1. Introduction

International Scientific Technical Journal "Problems of Control and Informatics" 68.6 (2023) 94-105. https://doi.org/10.34229/1028

Fuzzy Online Bagging Using Adaptive F-transform

Sergiy Popov

Iryna Pliss

Olexii Holovin

Oleh Zolotukhin

Larysa Chala

1 0 Central Scientific Research Institute of Armament and Military Equipment of the Armed Forces of Ukraine , Kyiv, 03049 , Ukraine 1 Kharkiv National University of Radio Electronics , Nauky av., 14, Kharkiv, 61166 , Ukraine

2024

3214 25 27

The ensemble multi-model bagging approach is considered. We propose a nonlinear adaptive bagging procedure which applies F-transform in its adaptive form to the results of a traditional weighting. This leads to further decrease of ensemble errors with low extra computational and time cost. Metamodel architecture and corresponding optimal learning algorithms are presented in details. Simulation based on short-term electric load forecasting problem confirms theoretical results and shows a significant decrease of the forecasting error in comparison to a linear approach.

eol>Nonlinear bagging adaptive ensemble optimal learning F-transform 1

1. Introduction 2. Architecture of the adaptive nonlinear bagging metamodel

The architecture of the adaptive nonlinear bagging metamodel is shown in Figure 1. #!() #(() ##() !∗ (∗ #∗

It is readily seen that the metamodel’s architecture is similar to F. Rosenblatt's elementary perceptron, but instead of a traditional activation function it contains a Nonlinear Synapse (NS) which is the main building block of a neo-fuzzy neuron [11-13] and implements the F-transform [14] in its adaptive version [15], i.e. it is essentially a universal approximator [16].

Output signals from ensemble members which are solving the same problem $ #!(), … , #"(), … , ##() (or in a vector form #() = *#!(), … , #"(), … , ##()+ , where = 1,2, . . . , is the current discrete time index) are fed to the metamodel’s inputs, then passed through adjustable synaptic weights !∗, … , "∗, … , #∗, and finally combined in the adder forming metamodel’s intermediate output signal #∗() in the form or in a vector form where ∗ = 2!∗, … , "∗, … , #∗3$.

The unbiasedness constraint is additionally imposed on the synaptic weights ∗ ( 1 ) ( 2 ) # #∗() = 1 "∗#"()

"&! #∗() = #$()∗, # 1 "∗ = #$∗ = 1 "&! (here #$ is a ( × 1) vector of ones). If we append inequality constraints on the non-negativity of the synaptic weights 0 ≤ "∗ ≤ 1, ∀, these synaptic weights can be given the meaning of the degrees of membership of each of the signals #"() to the optimal result.

Technically, the signal #∗() is already the solution of the optimization problem, however it can be improved by processing it in an adaptive Nonlinear Synapse (NS). NS is formed by nonlinear membership functions '2#∗()3, = 1,2, … , . Traditional triangular constructions that satisfy the Ruspini partition of unity conditions are usually used, although it is possible to use more complex $ where 2#∗()3 = *!2#∗()3, … , '2#∗()3, … , )2#∗()3+ , = = (=!, … , =', … , =))$.

Combining ( 1 )-( 4 ), finally we can write )

# #*() = 1 '&! ='' >1 "&!

"∗#"()?, #*() = $(#$()∗)=, Where adjusted synaptic weights vectors ∗ and = are subject to online learning.

3. Adaptive nonlinear bagging metamodel learning

Synaptic weights vector ∗ can be adjusted by gradient optimization of the learning criterion ( 4 ) ( 5 ) variants, e.g. B-splines, Gaussians, Epanechnikov kernels, etc. Each membership function output '2#∗()3 is multiplied by the corresponding adjustable weight =', and then summed in the second adder of the metamodel forming the output signal (here () – external reference signal) subject to constraints #$∗ = 1. It is achieved by searching for the saddle point of the Lagrange function where – undefined Lagrange multiplier.

The Arrow-Hurwiсz procedure can be used to find the saddle point in the following form ∗() = ∗( − 1) − +()∇+∗(∗, ),

F() = ( − 1) + ,() (∗, )⁄ , K ∗() = ∗( − 1) + +()2()#() − ( − 1)#3, () = ( − 1) + ,()2#$∗() − 13, where +(), ,() – learning step parameters, () = () − #$()∗( − 1) – learning error.

It is mathematically proven that signal #∗() = #$()∗() in terms of accuracy is not inferior to any of #"(), = 1,2, … , at the metamodel input.

The learning process can be optimized in terms of speed by the appropriate selection of the learning step parameter +(). If the following option is chosen the learning algorithm ( 5 ) can be written in the form +() =

, N ∗() = ∗( − 1) + ()2()#() − ( − 1)#3 ()‖#()‖( − ( − 1)#$#() , () = ( − 1) + ,()2#$∗() − 13.

When () = 0, it completely coincides with the adaptive speed-optimal Kaczmarz-Widrow-Hoff algorithm [17-20].

As noted above, the synaptic weights at the inputs of the metamodel can be given the meaning of the degrees of membership of each of the input signals #"() to the optimal signal, which should theoretically coincide with the reference signal (). In this case, the learning task consists in the minimization of the criterion (6) (7) (here – vector of non-negative indefinite Lagrange multipliers) and the Kuhn-Tucker system it is easy to write the Arrow-Hurwicz-Uzawa gradient procedure for finding the saddle point of the Lagrangian in the form

∇+"(-, , ) = 0#, P(-, , )⁄ = 0,

" ≥ 0, = 1,2, … , , -() = -( − 1) − +()∇+"(-, , ), P() = ( − 1) + ,() (-, , )⁄ ,

() = R( − 1) − .()∇.(-, , )S/ (here [∙]/ is the projector on the positive orthant), or ⎧-() = -( − 1) + +() *()#() − ( − 1)# − ( − 1)+,

() = ( − 1) + ,()2#$-() − 13, ⎨ ⎩() = R( − 1) − .()-()S/.

Similarly to ( 5 ) and (6), the learning algorithm (7) can also be optimized for speed. The optimized procedure has the following final form ⎪⎧-() = -( − 1) +

() *()#() − ( − 1)# − ( − 1)+ ()‖#()‖( − ( − 1)#$#() + $( − 1)#() , ⎨() = ( − 1) + ,()2#$-() − 13, ⎪ ⎩() = R( − 1) − .()-()S/. (8)

Thus, algorithms (6), (8) are designed for online adjustment of metamodel parameters ∗ or and ensure high accuracy of the obtained results.

As already mentioned, triangular-shaped functions are usually used as membership functions in the nonlinear synapse NS: ⎧#∗() − '0! , if #∗() ∈ ['0!, '], ⎪ ' − '0! '2#∗()3 = '/! − #∗() ⎨ , if #∗() ∈ [', '/!], ⎪ '/! − ' ⎩0 otherwise, where '0!, ', '/! – parameters of the centers of adjacent membership functions which are usually either uniformly distributed over the abscissa axis or can be found using clustering procedures [20, 21].

The main advantage of such functions is that at each learning cycle only two adjacent functions are activated and accordingly only two synaptic weights ='0!, =' or =', ='/! are adjusted which simplifies and speeds up nonlinear synapse tuning process.

A standard quadratic criterion can be used to tune the nonlinear synapse: f() = which is minimized using a gradient procedure =() = =( − 1) − +1 () *() − $2#∗()3=( − 1)+2#∗()3, (9) where the step parameter +1 () is chosen either using the Kaczmarz-Widrow-Hoff procedure [17, 19] or using other approaches [22, 23] which provide additional filtering properties of the learning process.

4. Simulation results

As a test case, we apply the proposed bagging approach to the short-term electric load forecasting problem (STLF), specifically 1-step ahead forecasting of daily electric load of one of regional power systems of Ukraine. We have the original series with = 337 samples and = 6 forecast series (337 samples each) generated by 6 different independent computational intelligence models. We treat the time series as a data stream, i.e. forecasting and metamodel operation is performed in online mode, therefore the whole dataset is processed only once (sample by sample, = 1,2, . . . , ) and there is no need to divide it into training, validation and test sets.

The original series (Figure 2) has several trends corresponding to different seasons, periodic (mostly weekly) patterns, sudden changes and outliers. Obviously, there is a strong random component, because electric load in large systems depends on many external factors, some of which have true random or chaotic nature, e.g. weather conditions [24]. So, the time series is nonstationary and noisy by its nature, hence its forecasting is quite challenging and usually different forecasting models/methods perform better than others on particular parts of the series and are inferior on other parts. One model/method is rarely better than all others on the whole series. It is exactly the case when bagging methods come into play and can improve overall forecasting accuracy attempting to take the best from all models/methods in the ensemble.

We employ 6 specialized STLF models in the ensemble that have different inputs and structures. Such a diversity is aimed at capturing different properties of different parts of the series under consideration. Figure 3 shows last 30 days of the time series with the corresponding forecasts. We can see that long-term trends are more or less well captured by all models, but short-term changes pose a problem to all of them so that no single model is significantly better than the others.

We apply model ( 2 ) with algorithm (6) to obtain an optimal linear combination #∗() of the 6 forecasts from the ensemble member models. Just by a visual inspection of the plot it is obvious that #∗() is generally closer to the true series (), which is also confirmed by corresponding errors comparison in Table 1. We employ Mean Absolute Percentage Error (MAPE) criterion that is widely used in short-term electric load forecasting research and has a clear physical sense.

The best of ensemble member models provides MAPE of 4.858%, which is reduced to 4.144% by the linear bagging procedure. Then we additionally apply to #∗() the adaptive F-transform ( 4 ) in order to exploit any possible remaining nonlinearities which cannot be approximated by the linear model ( 2 ). In this simple test case, the nonlinear synapse has 10 triangular membership functions '2#∗()3 whose centers ' are uniformly distributed between the minimum and maximum values of the time series (). NS parameters =() are tuned by procedure (9). This additional nonlinear processing step further reduces the bagging error to 3.9595%, which is 1.23 times less than the lowest error provided by the best ensemble member alone.

The aforementioned processing steps can be summarized as a pseudo-code below. Algorithm 1 Adaptive nonlinear bagging procedure performed on each time step

Step 1. Receive input signals from ensemble members #!(), … , ##().

Step 2. Calculate the intermediate output signal #∗() as a linear combination of inputs ( 2 ). Step 3. Apply adaptive F-transform ( 4 ) to obtain the output signal #*().

Step 4. Update weights ∗() using learning algorithm (6).

Step 5. Update NS parameters w ̃(k) with procedure (9).

5. Conclusions

A fuzzy nonlinear online bagging procedure is proposed. It provides optimal results of the ensemble of computational intelligence systems for solving Data Stream Mining problems when the data are received for processing in real time and are non-stationary in nature. The proposed approach has a simple numerical implementation and high processing rate.

Simulations confirm theoretical results. Optimal linear combination provides errors lower than the lowest error among the ensemble member models. Nonlinear F-transform provides additional decrease of the error, overall by 1.23 times in comparison to the best model in the ensemble.

Future research on the topic would focus on fine tuning of the nonlinear synapse parameters (membership function types, their centers initialization and adaptation, etc.) and including other types of nonlinearities in the metamodel.

[1]

Breiman , Bagging predictors, Machine Learning 24 ( 1996 ) 126 - 140 . https://doi.org/10.1007/BF00058655.

[2]

J. H.

Friedman , P. Hall, On bagging and nonlinear estimation , Journal of Statistical Planning and Inference 137.3 ( 2007 ) 669 - 683 . https://doi.org/10.1016/j.jspi. 2006 . 06 .002.

[3]

Z.-H.

Zhou ,

Wu ,

Tang , Ensembling neural networks: Many could be better than all , Artificial Intelligence 137 . 1 - 2 ( 2002 ) 239 - 263 . https://doi.org/10.1016/S0004- 3702 ( 02 ) 00190 - X .

[4] Ye . Bodyanskiy, P.

Otto , I. Pliss, S.

Popov , An Optimal Algorithm for Combining Multivariate Forecasts in Hybrid Systems , in: V. Palade , R.J. Howlett , L. Jain (Eds) Knowledge-Based Intelligent Information and Engineering Systems . KES 2003 , volume 2774 of Lecture Notes in Computer Science, Springer, Berlin, Heidelberg, 2003 . https://doi.org/10.1007/978-3- 540 -45226- 3_ 132 .

[5] Ye. Bodyanskiy , S. Popov , Fuzzy Selection Mechanism for Multimodel Prediction . in M.G. Negoita,

R.J.

Howlett , L.C. Jain (Eds), Knowledge-Based Intelligent Information and Engineering