Fuzzy Online Bagging Using Adaptive F-transform Sergiy Popov1, Iryna Pliss1, Olexii Holovin2, Oleh Zolotukhin1 and Larysa Chala1 1 Kharkiv National University of Radio Electronics, Nauky av., 14, Kharkiv, 61166, Ukraine 2 Central Scientific Research Institute of Armament and Military Equipment of the Armed Forces of Ukraine, Kyiv, 03049, Ukraine Abstract The ensemble multi-model bagging approach is considered. We propose a nonlinear adaptive bagging procedure which applies F-transform in its adaptive form to the results of a traditional weighting. This leads to further decrease of ensemble errors with low extra computational and time cost. Metamodel architecture and corresponding optimal learning algorithms are presented in details. Simulation based on short-term electric load forecasting problem confirms theoretical results and shows a significant decrease of the forecasting error in comparison to a linear approach. Keywords Nonlinear bagging, adaptive ensemble, optimal learning, F-transform 1 1. Introduction Currently, Computational Intelligence (CI) systems such as Artificial Neural Networks (ANNs), both deep (DNNs) and traditional – shallow (SNNs), Neuro-Fuzzy Systems (NFS), Neo-Fuzzy Systems, etc. have been widely used to solve many Data Mining problems. This success is explained by their universal approximating and extrapolating capabilities and the ability to learn, i.e. adjust their parameters based on the data obtained from the observed object during its operation. At the same time, quite often there is a problem of choosing a specific system or network that can best cope with the problem being solved. Although DNNs provide high quality solutions to the problem, but require very large volumes of training samples and a lot of time for their training. SNNs such as Radial Basis Function Neural Networks (RBFNs) are inferior to DNNs in terms of accuracy, but are able to learn online, i.e. solve Data Stream Mining problems. Neuro-Fuzzy and Neo-Fuzzy Systems can effectively process non-stationary signals, etc. Therefore, choosing a specific system is a non-trivial task and usually requires considerable experience of the researcher. To overcome the problems of choosing a specific system for a specific task, the ensemble multi- model bagging approach [1-10] is quite often used, when the task is concurrently solved using an ensemble of systems functioning in parallel. Their output signals are somehow combined using a so- called metamodel which forms the optimal result. Usually, weighted averaging is used, where the weights are calculated by the metamodel itself. As a rule, these are batch procedures working in offline mode, although adaptive linear online approaches are known [4-6, 9, 10] for solving Data Stream Mining problems. Non-linear bagging procedures practically do not exist with a few, but still offline mode exceptions [2]. Therefore, it is expedient to develop an adaptive nonlinear bagging metamodel that would combine and generalize the ensemble members' processing results in online mode with high speed and accuracy. ProfIT AI 2024: 4th International Workshop of IT-professionals on Artificial Intelligence (ProfIT AI 2024), September 25–27, 2024, Cambridge, MA, USA serhii.popov@nure.ua (S. Popov); iryna.pliss@nure.ua (I. Pliss), a_a_golovin@ukr.net (O. Holovin); oleg.zolotukhin@nure.ua (O. Zolotukhin), larysa.chala@nure.ua (L. Chala) 0000-0002-1274-5830 (S. Popov); 0000-0001-7918-7362 (I. Pliss); 0000-0003-4662-4559 (O. Holovin); 0000-0002-0152- 7600 (O. Zolotukhin); 0000-0002-9890-4790 (L. Chala) Β© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Wor Pr ks hop oceedi ngs ht I tp: // ceur - SSN1613- ws .or 0073 g CEUR Workshop Proceedings (CEUR-WS.org) CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings 2. Architecture of the adaptive nonlinear bagging metamodel The architecture of the adaptive nonlinear bagging metamodel is shown in Figure 1. 𝑦# βˆ— (π‘˜) NS 𝑦#! (π‘˜) 𝑀!βˆ— πœ‡! 𝑀 =! 𝑦#( (π‘˜) 𝑀(βˆ— πœ‡( 𝑀 =( Ξ£ Ξ£ 𝑦# * (π‘˜) 𝑦## (π‘˜) 𝑀#βˆ— πœ‡) 𝑀 =) Figure 1: Adaptive nonlinear bagging metamodel architecture It is readily seen that the metamodel’s architecture is similar to F. Rosenblatt's elementary perceptron, but instead of a traditional activation function it contains a Nonlinear Synapse (NS) which is the main building block of a neo-fuzzy neuron [11-13] and implements the F-transform [14] in its adaptive version [15], i.e. it is essentially a universal approximator [16]. Output signals from π‘ž ensemble members which are solving the same problem $ 𝑦#! (π‘˜), … , 𝑦#" (π‘˜), … , 𝑦## (π‘˜) (or in a vector form 𝑦#(π‘˜) = *𝑦#! (π‘˜), … , 𝑦#" (π‘˜), … , 𝑦## (π‘˜)+ , where π‘˜ = 1,2, . . . , 𝑁 is the current discrete time index) are fed to the metamodel’s inputs, then passed through adjustable synaptic weights 𝑀!βˆ— , … , 𝑀"βˆ— , … , 𝑀#βˆ— , and finally combined in the adder forming metamodel’s intermediate output signal 𝑦# βˆ— (π‘˜) in the form # βˆ— (π‘˜) 𝑦# = 1 𝑀"βˆ— 𝑦#" (π‘˜) (1) "&! or in a vector form 𝑦# βˆ— (π‘˜) = 𝑦# $ (π‘˜)𝑀 βˆ— , (2) $ where 𝑀 βˆ— = 2𝑀!βˆ— , … , 𝑀"βˆ— , … , 𝑀#βˆ— 3 . The unbiasedness constraint is additionally imposed on the synaptic weights 𝑀 βˆ— # 1 𝑀"βˆ— = 𝐼#$ 𝑀 βˆ— = 1 "&! (here 𝐼#$ is a (π‘ž Γ— 1) vector of ones). If we append inequality constraints on the non-negativity of the synaptic weights 0 ≀ 𝑀"βˆ— ≀ 1, βˆ€π‘, these synaptic weights can be given the meaning of the degrees of membership of each of the signals 𝑦#" (π‘˜) to the optimal result. Technically, the signal 𝑦# βˆ— (π‘˜) is already the solution of the optimization problem, however it can be improved by processing it in an adaptive Nonlinear Synapse (NS). NS is formed by 𝑛 nonlinear membership functions πœ‡' 2𝑦# βˆ— (π‘˜)3, 𝑙 = 1,2, … , 𝑛. Traditional triangular constructions that satisfy the Ruspini partition of unity conditions are usually used, although it is possible to use more complex variants, e.g. B-splines, Gaussians, Epanechnikov kernels, etc. Each membership function output πœ‡' 2𝑦# βˆ— (π‘˜)3 is multiplied by the corresponding adjustable weight 𝑀 =' , and then summed in the second adder of the metamodel forming the output signal ) * (π‘˜) 𝑦# = ' πœ‡' 2𝑦# βˆ— (π‘˜)3, = 1𝑀 (3) '&! or in a vector form 𝑦# * (π‘˜) = πœ‡$ 2𝑦# βˆ— (π‘˜)3𝑀 =, (4) $ where πœ‡2𝑦# βˆ— (π‘˜)3 = *πœ‡! 2𝑦# βˆ— (π‘˜)3, … , πœ‡' 2𝑦# βˆ— (π‘˜)3, … , πœ‡) 2𝑦# βˆ— (π‘˜)3+ , 𝑀 = = (𝑀 =! , … , 𝑀 =) )$ . =' , … , 𝑀 Combining (1)-(4), finally we can write ) # * (π‘˜) 𝑦# =' πœ‡' >1 𝑀"βˆ— 𝑦#" (π‘˜)?, = 1𝑀 '&! "&! or 𝑦# * (π‘˜) = πœ‡$ (𝑦# $ (π‘˜)𝑀 βˆ— )𝑀 =, Where adjusted synaptic weights vectors 𝑀 βˆ— and 𝑀 = are subject to online learning. 3. Adaptive nonlinear bagging metamodel learning Synaptic weights vector 𝑀 βˆ— can be adjusted by gradient optimization of the learning criterion # ( 1 1 𝐸(π‘˜) = >𝑦(π‘˜) βˆ’ 1 𝑀"βˆ— 𝑦#" (π‘˜)? = (𝑦(π‘˜) βˆ’ 𝑦# $ (π‘˜)𝑀 βˆ— )( 2 2 "&! (here 𝑦(π‘˜) – external reference signal) subject to constraints 𝐼#$ 𝑀 βˆ— = 1. It is achieved by searching for the saddle point of the Lagrange function 1 𝐿(𝑀 βˆ— , πœ†) = (𝑦(π‘˜) βˆ’ 𝑦# $ (π‘˜)𝑀 βˆ— )( + πœ†2𝐼#$ 𝑀 βˆ— βˆ’ 1 3, 2 where πœ† – undefined Lagrange multiplier. The Arrow-Hurwiсz procedure can be used to find the saddle point in the following form 𝑀 βˆ— (π‘˜) = 𝑀 βˆ— (π‘˜ βˆ’ 1) βˆ’ πœ‚+ (π‘˜)βˆ‡+ βˆ— 𝐿(𝑀 βˆ— , πœ†), F πœ†(π‘˜) = πœ†(π‘˜ βˆ’ 1) + πœ‚, (π‘˜) πœ•πΏ(𝑀 βˆ— , πœ†)β„πœ• πœ†, or 𝑀 βˆ— (π‘˜) = 𝑀 βˆ— (π‘˜ βˆ’ 1) + πœ‚+ (π‘˜)2𝑒(π‘˜)𝑦#(π‘˜) βˆ’ πœ†(π‘˜ βˆ’ 1)𝐼# 3, K (5) πœ†(π‘˜) = πœ†(π‘˜ βˆ’ 1) + πœ‚, (π‘˜)2𝐼#$ 𝑀 βˆ— (π‘˜) βˆ’ 13, where πœ‚+ (π‘˜), πœ‚, (π‘˜) – learning step parameters, 𝑒(π‘˜) = 𝑦(π‘˜) βˆ’ 𝑦# $ (π‘˜)𝑀 βˆ— (π‘˜ βˆ’ 1) – learning error. It is mathematically proven that signal 𝑦# βˆ— (π‘˜) = 𝑦# $ (π‘˜)𝑀 βˆ— (π‘˜) in terms of accuracy is not inferior to any of 𝑦#" (π‘˜), 𝑝 = 1,2, … , π‘ž at the metamodel input. The learning process can be optimized in terms of speed by the appropriate selection of the learning step parameter πœ‚+ (π‘˜). If the following option is chosen 𝑒(π‘˜) πœ‚+ (π‘˜) = , 𝑒(π‘˜)‖𝑦#(π‘˜)β€–( βˆ’ πœ†(π‘˜ βˆ’ 1)𝐼#$ 𝑦#(π‘˜) the learning algorithm (5) can be written in the form 𝑒(π‘˜)2𝑒(π‘˜)𝑦#(π‘˜) βˆ’ πœ†(π‘˜ βˆ’ 1)𝐼# 3 𝑀 βˆ— (π‘˜) = 𝑀 βˆ— (π‘˜ βˆ’ 1) + , N 𝑒(π‘˜)‖𝑦#(π‘˜)β€–( βˆ’ πœ†(π‘˜ βˆ’ 1)𝐼#$ 𝑦#(π‘˜) (6) πœ†(π‘˜) = πœ†(π‘˜ βˆ’ 1) + πœ‚, (π‘˜)2𝐼#$ 𝑀 βˆ— (π‘˜) βˆ’ 13. When πœ†(π‘˜) = 0, it completely coincides with the adaptive speed-optimal Kaczmarz-Widrow-Hoff algorithm [17-20]. As noted above, the synaptic weights at the inputs of the metamodel can be given the meaning of the degrees of membership of each of the input signals 𝑦#" (π‘˜) to the optimal signal, which should theoretically coincide with the reference signal 𝑦(π‘˜). In this case, the learning task consists in the minimization of the criterion # ( 1 - 1 𝐸(π‘˜) = >𝑦(π‘˜) βˆ’ 1 𝑀" 𝑦#" (π‘˜)? = (𝑦(π‘˜) βˆ’ 𝑦# $ (π‘˜)𝑀 - )( 2 2 "&! subject to constraints - 𝐼#$ 𝑀 - = 1, 0 ≀ 𝑀" ≀ 1, βˆ€π‘. Introducing the Lagrange function 1 𝐿(𝑀 - , πœ†, 𝜌) = (𝑦(π‘˜) βˆ’ 𝑦# $ (π‘˜)𝑀 - )( + πœ†2𝐼#$ 𝑀 - βˆ’ 1 3 βˆ’ 𝜌$ 𝑀 - , 2 (here 𝜌 – vector of non-negative indefinite Lagrange multipliers) and the Kuhn-Tucker system βˆ‡+ " 𝐿(𝑀 - , πœ†, 𝜌) = 0# , Pπœ•πΏ(𝑀 - , πœ†, 𝜌)β„πœ• πœ† = 0, 𝜌" β‰₯ 0, 𝑝 = 1,2, … , π‘ž, it is easy to write the Arrow-Hurwicz-Uzawa gradient procedure for finding the saddle point of the Lagrangian in the form 𝑀 - (π‘˜) = 𝑀 - (π‘˜ βˆ’ 1) βˆ’ πœ‚+ (π‘˜)βˆ‡+ " 𝐿(𝑀 - , πœ†, 𝜌), - Pπœ†(π‘˜) = πœ†(π‘˜ βˆ’ 1) + πœ‚, (π‘˜) πœ•πΏ(𝑀 , πœ†, 𝜌)β„πœ• πœ†, 𝜌(π‘˜) = R𝜌(π‘˜ βˆ’ 1) βˆ’ πœ‚. (π‘˜)βˆ‡. 𝐿(𝑀 - , πœ†, 𝜌)S / (here [βˆ™]/ is the projector on the positive orthant), or - - βŽ§π‘€ (π‘˜) = 𝑀 (π‘˜ βˆ’ 1) + πœ‚+ (π‘˜) *𝑒(π‘˜)𝑦#(π‘˜) βˆ’ πœ†(π‘˜ βˆ’ 1)𝐼# βˆ’ 𝜌(π‘˜ βˆ’ 1)+ , πœ†(π‘˜) = πœ†(π‘˜ βˆ’ 1) + πœ‚, (π‘˜)2𝐼#$ 𝑀 - (π‘˜) βˆ’ 13, (7) ⎨ - ⎩𝜌(π‘˜) = R𝜌(π‘˜ βˆ’ 1) βˆ’ πœ‚. (π‘˜)𝑀 (π‘˜)S . / Similarly to (5) and (6), the learning algorithm (7) can also be optimized for speed. The optimized procedure has the following final form ⎧ - 𝑒(π‘˜ ) *𝑒(π‘˜)𝑦#(π‘˜) βˆ’ πœ†(π‘˜ βˆ’ 1)𝐼# βˆ’ 𝜌(π‘˜ βˆ’ 1)+ - βŽͺ𝑀 (π‘˜) = 𝑀 (π‘˜ βˆ’ 1) + 𝑒(π‘˜)‖𝑦#(π‘˜)β€–( βˆ’ πœ†(π‘˜ βˆ’ 1)𝐼 $ 𝑦#(π‘˜) + 𝜌$ (π‘˜ βˆ’ 1)𝑦#(π‘˜) , # (8) βŽ¨πœ†(π‘˜) = πœ†(π‘˜ βˆ’ 1) + πœ‚, (π‘˜)2𝐼#$ 𝑀 - (π‘˜) βˆ’ 13, βŽͺ - ⎩𝜌(π‘˜) = R𝜌(π‘˜ βˆ’ 1) βˆ’ πœ‚. (π‘˜)𝑀 (π‘˜)S/ . Thus, algorithms (6), (8) are designed for online adjustment of metamodel parameters 𝑀 βˆ— or 𝑀 - and ensure high accuracy of the obtained results. As already mentioned, triangular-shaped functions are usually used as membership functions in the nonlinear synapse NS: 𝑦# βˆ— (π‘˜) βˆ’ 𝑐'0! ⎧ , if 𝑦# βˆ— (π‘˜) ∈ [𝑐'0! , 𝑐' ], βŽͺ ' 𝑐 βˆ’ 𝑐'0! πœ‡' 2𝑦# βˆ— (π‘˜)3 = 𝑐'/! βˆ’ 𝑦# βˆ— (π‘˜) ⎨ , if 𝑦# βˆ— (π‘˜) ∈ [𝑐' , 𝑐'/! ], βŽͺ 𝑐'/! βˆ’ 𝑐' ⎩0 otherwise, where 𝑐'0! , 𝑐' , 𝑐'/! – parameters of the centers of adjacent membership functions which are usually either uniformly distributed over the abscissa axis or can be found using clustering procedures [20, 21]. The main advantage of such functions is that at each learning cycle only two adjacent functions are activated and accordingly only two synaptic weights 𝑀 ='0! , 𝑀 =' or 𝑀 =' , 𝑀 ='/! are adjusted which simplifies and speeds up nonlinear synapse tuning process. A standard quadratic criterion can be used to tune the nonlinear synapse: ) ( 1 1 ( 𝐸f (π‘˜) = g𝑦(π‘˜) βˆ’ 1 𝑀 =' πœ‡' 2𝑦# βˆ— (π‘˜)3h = 2𝑦(π‘˜) βˆ’ πœ‡$ 2𝑦# βˆ— (π‘˜)3𝑀 =3 , 2 2 '&! which is minimized using a gradient procedure =(π‘˜ βˆ’ 1) βˆ’ πœ‚+1 (π‘˜) *𝑦(π‘˜) βˆ’ πœ‡$ 2𝑦# βˆ— (π‘˜)3𝑀 = (π‘˜) = 𝑀 𝑀 =(π‘˜ βˆ’ 1)+ πœ‡2𝑦# βˆ— (π‘˜)3, (9) where the step parameter πœ‚+1 (π‘˜) is chosen either using the Kaczmarz-Widrow-Hoff procedure [17, 19] or using other approaches [22, 23] which provide additional filtering properties of the learning process. 4. Simulation results As a test case, we apply the proposed bagging approach to the short-term electric load forecasting problem (STLF), specifically 1-step ahead forecasting of daily electric load of one of regional power systems of Ukraine. We have the original series with 𝑁 = 337 samples and π‘ž = 6 forecast series (337 samples each) generated by 6 different independent computational intelligence models. We treat the time series as a data stream, i.e. forecasting and metamodel operation is performed in online mode, therefore the whole dataset is processed only once (sample by sample, π‘˜ = 1,2, . . . , 𝑁) and there is no need to divide it into training, validation and test sets. The original series (Figure 2) has several trends corresponding to different seasons, periodic (mostly weekly) patterns, sudden changes and outliers. Obviously, there is a strong random component, because electric load in large systems depends on many external factors, some of which have true random or chaotic nature, e.g. weather conditions [24]. So, the time series is nonstationary and noisy by its nature, hence its forecasting is quite challenging and usually different forecasting models/methods perform better than others on particular parts of the series and are inferior on other parts. One model/method is rarely better than all others on the whole series. It is exactly the case when bagging methods come into play and can improve overall forecasting accuracy attempting to take the best from all models/methods in the ensemble. Figure 2: Daily electric load time series We employ 6 specialized STLF models in the ensemble that have different inputs and structures. Such a diversity is aimed at capturing different properties of different parts of the series under consideration. Figure 3 shows last 30 days of the time series with the corresponding forecasts. We can see that long-term trends are more or less well captured by all models, but short-term changes pose a problem to all of them so that no single model is significantly better than the others. Figure 3: Forecasting results: true electric load (solid black line), 6 independent 1-day ahead forecasts (color lines), two metamodel forecasts: 𝑦# βˆ— (π‘˜) (dotted black line), 𝑦# * (π‘˜) (dashed black line). We apply model (2) with algorithm (6) to obtain an optimal linear combination 𝑦# βˆ— (π‘˜) of the 6 forecasts from the ensemble member models. Just by a visual inspection of the plot it is obvious that 𝑦# βˆ— (π‘˜) is generally closer to the true series 𝑦(π‘˜), which is also confirmed by corresponding errors comparison in Table 1. We employ Mean Absolute Percentage Error (MAPE) criterion that is widely used in short-term electric load forecasting research and has a clear physical sense. Table 1 1-day ahead forecasting errors for all models and the ensemble outputs Models #1 #2 #3 #4 #5 #6 𝑦# βˆ— (π‘˜) 𝑦# * (π‘˜) MAPE 6.8171% 7.5066% 4.8580% 5.0311% 4.8827% 5.1015% 4.1440% 3.9595% The best of ensemble member models provides MAPE of 4.858%, which is reduced to 4.144% by the linear bagging procedure. Then we additionally apply to 𝑦# βˆ— (π‘˜) the adaptive F-transform (4) in order to exploit any possible remaining nonlinearities which cannot be approximated by the linear model (2). In this simple test case, the nonlinear synapse has 10 triangular membership functions πœ‡' 2𝑦# βˆ— (π‘˜)3 whose centers 𝑐' are uniformly distributed between the minimum and maximum values of the time series 𝑦(π‘˜). NS parameters 𝑀 =(π‘˜) are tuned by procedure (9). This additional nonlinear processing step further reduces the bagging error to 3.9595%, which is 1.23 times less than the lowest error provided by the best ensemble member alone. The aforementioned processing steps can be summarized as a pseudo-code below. Algorithm 1 Adaptive nonlinear bagging procedure performed on each time step π’Œ Step 1. Receive input signals from π‘ž ensemble members 𝑦#! (π‘˜), … , 𝑦## (π‘˜). Step 2. Calculate the intermediate output signal 𝑦# βˆ— (π‘˜) as a linear combination of inputs (2). Step 3. Apply adaptive F-transform (4) to obtain the output signal 𝑦# * (π‘˜). Step 4. Update weights 𝑀 βˆ— (π‘˜) using learning algorithm (6). Step 5. Update NS parameters w (Μƒ k) with procedure (9). 5. Conclusions A fuzzy nonlinear online bagging procedure is proposed. It provides optimal results of the ensemble of computational intelligence systems for solving Data Stream Mining problems when the data are received for processing in real time and are non-stationary in nature. The proposed approach has a simple numerical implementation and high processing rate. Simulations confirm theoretical results. Optimal linear combination provides errors lower than the lowest error among the ensemble member models. Nonlinear F-transform provides additional decrease of the error, overall by 1.23 times in comparison to the best model in the ensemble. Future research on the topic would focus on fine tuning of the nonlinear synapse parameters (membership function types, their centers initialization and adaptation, etc.) and including other types of nonlinearities in the metamodel. References [1] L. Breiman, Bagging predictors, Machine Learning 24 (1996) 126–140. https://doi.org/10.1007/BF00058655. [2] J. H. Friedman, P. Hall, On bagging and nonlinear estimation, Journal of Statistical Planning and Inference 137.3 (2007) 669–683. https://doi.org/10.1016/j.jspi.2006.06.002. [3] Z.-H. Zhou, J. Wu, W. Tang, Ensembling neural networks: Many could be better than all, Artificial Intelligence 137.1-2 (2002) 239–263. https://doi.org/10.1016/S0004-3702(02)00190-X. [4] Ye. Bodyanskiy, P. Otto, I. Pliss, S. Popov, An Optimal Algorithm for Combining Multivariate Forecasts in Hybrid Systems, in: V. Palade, R.J. Howlett, L. Jain (Eds) Knowledge-Based Intelligent Information and Engineering Systems. KES 2003, volume 2774 of Lecture Notes in Computer Science, Springer, Berlin, Heidelberg, 2003. https://doi.org/10.1007/978-3-540-45226- 3_132. [5] Ye. Bodyanskiy, S. Popov, Fuzzy Selection Mechanism for Multimodel Prediction. in M.G. Negoita, R.J. Howlett, L.C. Jain (Eds), Knowledge-Based Intelligent Information and Engineering Systems. KES 2004, volume 3214 of Lecture Notes in Computer Science, Springer, Berlin, Heidelberg, 2004, pp. 772–778. https://doi.org/10.1007/978-3-540-30133-2_101. [6] A. Bifet, G. Holmes, B. Pfahringer, R. GavaldΓ , Improving Adaptive Bagging Methods for Evolving Data Streams, in: ZH. Zhou, T. Washio (Eds), Advances in Machine Learning. ACML, volume 5828 of Lecture Notes in Computer Science, Springer, Berlin, Heidelberg, 2009, pp. 23– 37. https://doi.org/10.1007/978-3-642-05224-8_4. [7] C. Zhang, Y. Ma (Eds) Ensemble Machine Learning. Methods and Applications, Springer New York, NY, 2012. https://doi.org/10.1007/978-1-4419-9326-7. [8] D. Sarkar, V. Natarajan, Ensemble Machine Learning Cookbook, Packt Publishing, Birmingham, 2019. [9] N. C. Oza, Online bagging and boosting, in Proc. 2005 IEEE International Conference on Systems, Man and Cybernetics, Waikoloa, HI, USA, 2005, Vol. 3, pp. 2340-2345. https://doi.org/10.1109/ICSMC.2005.1571498. [10] E. Lughofer, M. Pratama, I. Skrjanc, Online bagging of evolving fuzzy systems, Information Sciences 570 (2021) 16–33. https://doi.org/10.1016/j.ins.2021.04.041. [11] T. Yamakawa, E. Uchino, T. Miki, H. Kusanagi, A neo fuzzy neuron and its applications to system identification and prediction of the system behavior, in Proc. 2nd Int. Conf. on Fuzzy Logic and Neural Networks, 1992, pp. 477-483. [12] E. Uchino, T. Yamakava, Neo-fuzzy neuron based new approach to system modeling with application to actual system, in Proc. Sixth International Conference on Tools with Artificial Intelligence, New Orlean, LA, USA, 1994, pp.564-570. [13] T. Miki, T. Yamakawa, Analog implementation of neo-fuzzy neuron and its on-board learning, in Computational Intelligence and Applications, WSES Press, Piraeus, 1999, pp. 144-149. [14] I. Perfilieva, Fuzzy transforms: Theory and applications, Fuzzy Sets and Systems 157.8 (2006) 993-1023. https://doi.org/10.1016/j.fss.2005.11.012. [15] Ye. Bodyanskiy, N. Teslenko, Adaptyvne F-peretvorennya na isnovi uzagal’nenoi regresiynoi neyronnoi merezhi, Adaptyvni systemy avtomatychnogo upravlinnya 10 (2007) 25–31. (In Ukrainian) [16] Ye. Bodyanskiy, S. Kostiuk Neuron based on an adaptive fuzzy transform for modern artificial neural network models, International Scientific Technical Journal "Problems of Control and Informatics" 68.6 (2023) 94–105. https://doi.org/10.34229/1028-0979-2023-6-7. [17] S. Kaczmarz, Angenaherte auflosung von systemen linearer gleichungen, Bulletin International de l’Academie Polonaise des Sciences et des Lettres 35 (1937) 355–357. (In German) [18] S. Kaczmarz, Approximate solution of systems of linear equations, International Journal of Control 57.6 (1993) 1269–1271. https://doi.org/10.1080/00207179308934446. [19] B. Widrow, M.E. Hoff, Adaptive switching circuits, in: 1960 IRE WESCON Convention Record, Part 4, IRE, New York,1960, pp. 96–104. [20] T. Kohonen, Self-Organizing Maps, Springer-verlag, Berlin, 1995. [21] Ye. Bodyanskiy, E. Vynokurova, A. Dolotov, Self-Learning Cascade Spiking Neural Network for Fuzzy Clustering Based on Group Method of Data Handling, Journal of Automation and Information Sciences 45.3 (2013) 23–33. https://doi.org/10.1615/JAutomatInfScien.v45.i3.30. [22] P. Otto, Ye. Bodyanskiy, V. Kolodyazhniy, A new learning algorithm for a forecasting neuro- fuzzy network, Integrated Computer Aided Engineering 10 (2003) 399-409. https://doi.org/10.3233/ICA-2003-10409. [23] Ye. Bodyanskiy, O. Vynokurova, I. Pliss, G. Setlak, P. Mulesa, Fast learning algorithm for deep evolving GMDH-SVM neural network in data stream mining tasks, in Proc. 2016 IEEE First International Conference on Data Stream Mining & Processing (DSMP), Lviv, Ukraine, 2016, pp. 257–262. https://doi.org//10.1109/DSMP.2016.7583555. [24] Ye. Bodyanskiy, S. Popov, T. Rybalchenko, Feedforward neural network with a specialized architecture for estimation of the temperature influence on the electric load, in Proc. 2008 4th International IEEE Conference Intelligent Systems, Varna, Bulgaria, 2008, pp. 7-14–7-18. https://doi.org//10.1109/IS.2008.4670444.