1. Introduction

Y. Bodyanskiy);

Investigation of the effectiveness of the Hybrid System of Computational Intelligence based on bagging and Group Method of Data Handling in the task of stock index forecasting⋆

Yevgeniy Bodyanskiy

yevgeniy.bodyanskiy@nure.ua 0

Yuriy Zaychenko

zaychenkoyuri@ukr.net 1

Oleksii Kuzmenko

oleksii.kuzmenko@ukr.net 1

Helen Zaichenko

1 0 Artificial Intelligence Department, Kharkiv National University of Radio Electronics , Nauky Avenue 14, Kharkiv, 61166 , Ukraine 1 Institute for Applied System Analysis, Igor Sikorsky Kyiv Polytechnic Institute , Beresteiskyi Avenue 37-A, Kyiv, 03056 , Ukraine

000 0 0001

The article considers intelligent methods of different generations for solving the problem of short- and medium-term forecasting of stock indices. The effectiveness of a fully connected network (BackPropagation), Group Method of Data Handling (GMDH), and Hybrid System of Computational Intelligence based on bagging and Group Method of Data Handling (HSCI-bagging) is considered. The influence of experimental parameters on the MSE and MAPE forecasting quality criteria is investigated. The obtained results are analyzed and the optimal parameters for different forecasting intervals are determined. On the basis of the optimal parameters found, the expediency of using the hybrid HSCI-bagging system in the forecasting task at different intervals is substantiated.

eol>Hybrid System of Computational Intelligence bagging GMDH BackPropagation forecasting 1

1. Introduction

The progress of intelligent methods has evolved in several stages and has defined a generation of artificial neural networks. The emergence of each new generation of artificial neural networks is facilitated by new challenges, such as the complexity of the tasks they are supposed to solve. The first generation of artificial neural networks includes the perceptron, developed in 1957 by Frank Rosenblatt. It is an early development of artificial intelligence and is considered the simplest type of artificial neural network. A significant limitation of this generation of neural networks is that they can only solve linearly separable problems. The second generation made it possible to solve the XOR task. The main difference between this generation of networks is their multilayer architecture, and the most famous example is the BackPropagation neural network proposed by Rumelhart, Hinton, and Williams in 1986 [ 1 ]. The next generation is characterized by an architecture with a large number of layers (Deep Neural Networks) [ 2 ]. It was the third generation that allowed for a significant breakthrough in many areas of artificial intelligence. It is worth noting that its emergence was also facilitated by the improvement of hardware and the development of data storage and processing technologies. This is very important, as this generation of networks requires significant computational costs and large samples of training data. The emergence of the fourth generation was facilitated by the complexity of artificial intelligence tasks. A feature of this generation of networks is the ability to perform several different tasks using a single architecture. For this purpose, new architectures and methods have been created: large language models (LLMs), generative models, hybrid approaches to model building, etc.

Thus, the latest generation emphasizes the possibility of multitasking and versatility but also requires optimization of the complexity of the models, since the accuracy of the results depends on the computational costs. Deep learning neural networks play an important role in modern approaches to solving problems of prediction, classification, image processing, and natural language text analysis. Their universal approximation properties make them indispensable in many fields. However, the practical application of such powerful tools requires large amounts of training data, long time for model training, and significant computational costs. These are the key problems that can hinder the use of this type of neural network to solve complex problems and, consequently, obtain high accuracy results.

It is especially difficult to use deep neural networks when it is necessary to process non-stationary data streams transmitted in real time. In this case, radial basis function networks, Wang-Mendel, Takagi-Sugeno, or probabilistic neural networks can be an alternative. However, compared to deep neural networks, they have two significant drawbacks: lower accuracy of results and empirical selection of system parameters.

The above difficulties can be overcome by using an ensemble approach [ 3-8 ] to combine the work of models of different levels of complexity – from the first to the fourth generation. Ensembles allow the simultaneous use of different systems to solve a single problem in parallel in order to combine the output results. The use of the Bagging procedure [9] minimizes the error on the training set, even when the data is received online. The problem still remains the complexity of the system architecture, and thus the need for significant computational costs.

To simplify the ensemble model, we can apply Group Method of Data Handling (GMDH) [10, 11] to decompose the system and determine its optimal architecture. This method, called the “first deep learning method” by J. Schmidhuber [12], was used in the 1970s to create multilevel networks. It is worth noting that the advantage of using GMDH is not only to simplify the complexity of the system, but also to increase the accuracy of its results [13-17].

Therefore, we consider it expedient to investigate the proposed [18] Hybrid System of Computational Intelligence based on bagging and Group Method of Data Handling (HSCI-bagging). The aim of the investigation is to determine the optimal parameters of the hybrid system and the corresponding efficiency in the task of short- and middle-term stock index forecasting.

3. Overview of HSCI architecture based on Bagging and GMDH

In Fig. 1 the architecture of the proposed system is presented.

The architecture of the system contains 2S sequentially connected stacks, while odd stacks are formed by ensembles of parallel-connected subsystems that solve the same problem (recognition, prediction, etc.) and even ones are essentially learning metamodels that generalize the output signals of ensembles and form optimal results in the sense of the accepted criterion. The output signal of the first metamodel is the generalized optimal signal y¿1( k ) and ( n−1 ) output signals ^y[i1μ] ( k ) , i=1,2. .. , n−1 ( k ) "best members of the ensemble". At their core, metamodels function as selection units in traditional GMDH systems, but not only select the best results from the previous stack, but also form the optimal solution based on these results.

Further, the output signals of the first metamodel are fed to the inputs of the second ensemble, which is completely similar to the first. The outputs of the second ensemble ^y[12] ( k ) , ^y[22] ( k ) , ... , ^y[q2] ( k ) come to the second metamodel, which calculates the optimal signal y¿[ 2 ]( k ) and ( n−1) ^y[i2μ] ( k ) "closest" to it. The last S-th ensemble is similar to the first two, and the output of the last S-th metamodel is y¿[s]( k ), which exactly corresponds to a priori established requirements for the quality of solving the problem under consideration.

Each of the ensembles contains q different computational intelligence systems that solve the same problem. There may still be simple neural networks such as a single-layer perceptron, radial-basis neural network (RBFN), counterpropagating neural network, etc., which do not use error backpropagation procedure for training, neuro-fuzzy systems such as ANFIS, Wang-Mendel or Takagi-Sugeno-Kang type, wavelet-neuro systems, neo-fuzzy neurons and others, the output signal of which linearly depends on the adapted parameters, which allows to use optimal speed learning algorithms.

4. HSCI-bagging learning algorithm

The input information, on the basis of which the system is configured, is a training selection of input signals:

x (1) , x (2) , … , x ( k ) , … , x ( N ) ; x ( k )=( x1 ( k ) , … , xi ( k ) , … , xn ( k ))T ∈ Rn (1) and its corresponding scalar refence signals y (1) , ... , y ( k ) , ... , y ( N ). On the basis of these observations, the elements of the first ensemble are tuned independently of each other, at the outputs of which q scalar signals ^y[p2] ( k ) , p=1,2 , ... , q, are formed, which are conveniently represented in the form of a vector ^y[ 1 ] ( k )=( ^y[11] ( k ) , … , ^y[p1] ( k ) , … , ^y[q1] ( k ) )T. These signals are sent to the inputs of the first metamodel, at the outputs of which n sequences ^y¿[ 1 ] ( k ) , ^y[11μ] ( k ) , … , ^y[iμ1] ( k ) , … , ^y[n1−]1, μ ( k ) the main of which is y¿[ 1 ] ( k ) while others are auxiliary. The main signal of the metamodel y¿[ 1 ] ( k ) is the union of the outputs of all members of the ensemble in the form of:

q y¿[ 1 ] ( k )=∑ w¿p[ 1 ] ^y[p1] ( k )= ^y[ 1 ]T ( k ) w¿[ 1 ] (2)

p=1 where w¿[ 1 ]=( w¿[ 1 ] , ... , w¿p[ 1 ] ... w¿[ 1 ] )T – is a vector of adapted parameters-synaptic weights on which q additionally restrictions are set on unbiasedness:

q ∑ w¿p[ 1 ] ¿ I Tq w¿[ 1 ]=1 , (3) p=1 where I q – ( q × 1 ) – is the vector of unities.

The problem of teaching the first metamodel is reduced to minimizing the standard quadratic criterion in the presence of additional constraints (3).

Thus, the problem of training the first metamodel can be solved using the standard method of penalty functions, which in this case reduces to minimizing the expression:

J ( w¿[ 1 ] , δ )=( Y ( N )−Y [ 1 ]( N ) w¿[ 1 ] )T ( Y ( N )−Y [ 1 ]( N ) w¿[ 1 ] )+ δ−2( I Tq w¿[ 1 ]−1 ) where Y ( N )=( y ( 1 ) , … , y ( k ) , … , y ( N ))T −( N × 1) is a Y 1( N )=( ^y[ 1 ]( 1 ) , … , ^y[ 1 ]( k ) , … , ^y[ 1 ]( N ))T −( N × q ) is a matrix, δ is the penalty coefficient. (4) vector, Minimization (4) by w¿[ 1 ] leads to the result: where w LS [ 1 ] is a standard LSM estimate:

w LS [ 1 ]=( Y [ 1 ]T ( N ) Y [ 1 ] ( N ) )+¿Y [ 1 ]T (N )Y (N )=P[ 1 ]( N )Y [ 1 ]T ( N )Y ( N ).¿

It was proved [4, 5] that the use of score (5) leads to results that are not inferior in accuracy to the best of the members of the first ensemble.

If observations from the training sample are processed sequentially online, it is advisable to use the least squares recurrent method in the form:

P[ 1 ] ( k ) ^y[ 1 ] ( k +1) ^y[ 1 ]T ( k +1) P[ 1 ] ( k ) { w LS [ 1 ] ( k +1)=w LS [ 1 ] ( k )+ P[ 1 ]( k +1 )( y ( k +1 )− ^y[ 1 ]T ( k +1 ) w LS [ 1 ]( k ) ^y[ 1 ]( k +1 ) , P[ 1 ] ( k +1)= P[ 1 ] ( k )− ,

1+ ^y[ 1 ]T ( k +1) P[ 1 ] ( k ) ^y[ 1 ] ( k +1) w[ 1 ] ( k +1)=w LS [ 1 ] ( k +1)+ P[ 1 ]( k +1 )( I Tq P[ 1 ]( k +1 ) I q )−1 ×( 1− I Tq w LS [ 1 ] ( k +1) I q , w¿p (0)=q−1 p=1 , 2 , ... , q . (5) (6) (7) (8) or if a training sample is non-stationary we may use exponentially weighted recurrent LSM method: ¿ w¿[ 1 ] ( k +1)=w LS [ 1 ] ( k +1)+ P[ 1 ]( k +1 )( I Tq P[ 1 ]( k +1 ) I q )−1 ×( 1− I Tq w LS [ 1 ]( k +1 ) I q , w¿p (0)=q−1 p=1 , 2 , ... , q . where 0< α ≤ 1 – forgetting factor.

To the parameters of the metamodel can be given meaning the levels of fuzzy membership to the optimal output signal by introducing additional restrictions on the non-negative values of these parameters, that is, in addition to the configurable parametersw¿[ 1 ] we can also calculate the levels of this membership μ[p1] ≥ 0 , p=1,2 , ... , q.

To do this, we introduce into consideration the extended Lagrange function:

L ( μ[ 1 ] , , ρ )=( Y ( N )−Y [ 1 ]( N ) μ[ 1 ] )T ( Y ( N )−Y [ 1 ]( N ) μ[ 1 ] )+( I Tq μ[ 1 ]−1 )− ρT μ[ 1 ] , (9) where ρ−( q × 1 ) – is vector of non-negative indefinite Lagrange multipliers.

Using the equations system by Kuhn-Tucker: {V μ[ 1 ] L( μ[ 1 ] , , ρ )=0⃗ ,

∂ L( μ[ 1 ] , , ρ )/ ∂=0 ,

It’s not difficult to get the solution in the form:

μ[ 1 ]= p[ 1 ]( N )( ^yT ( N )Y ( N )−0 , 5 I q+0 , 5 ρ ) , {¿ I Tq p[ 1 ]( N ) ^yT ( N )Y ( N )−1+0 , 5 I Tq p[ 1 ]( N ) ρ 0 , 5 I Tq p[ 1 ]( N ) I q .

For finding vector of non-negative Lagrange

Arrow-Hurwitz-Uzawa procedure: multipliers, it’s reasonable to apply where Pr+¿(.)¿ is a projector to positive ortant, ❑ρ ( k ) – learning rate, parameter.

First expression (12) after non-complex transformations may be presented in a more compact form: μ[ 1 ]( k +1 )=w¿[ 1 ]( k +1 )− P[ 1 ]( k +1 ) 0 , 5 I Tq P[ 1 ]( k +1 ) ρ ( k )

I Tq P[' ]( k +1 ) I q

+0 , 5 P[ 1 ]( k +1 ) ρ ( k )=¿ ¿ w¿[ 1 ]( k +1 )+0 , 5(I q q−

P[ 1 ]( k +1 ) I q I q

T I Tq P[ 1 ]( k +1 ) I q )× P[ 1 ] ( k +1) ρ ( k ) , where instead of least squares estimates w LS [ 1 ]( k +1 ), the parameters of the metamodel w¿[ 1 ]( k +1 ) are used, which simplifies the process of configuring it.

As a result of learning the first metamodel, the optimal signal y¿[ 1 ] ( k ) is formed at its output, as well as q signals ^y[p1,]μ ( k ) from which we choose n−1( if q ≥ n ) with the highest levels of fuzzy membership μ[ 1 ], which subsequently in the form of ( n × 1 ) – vector are fed to the input of the second p ensemble, the outputs of which go to the inputs of the second metamodel, and so on. The process of increasing the number of ensembles and metamodels continues until the required accuracy of the last metamodel with the output y¿[s] ( k ) is achieved, or the value of the criterion minimized for the bagging model begins to increase, i.e. ε2 ( y¿[s+1] ( k )) ≥ ε2 ( y¿[s] ( k )).

5. Experimental investigations

The dependence of the prediction accuracy of the hybrid HSCI-bagging system on the parameters in the prediction task was investigated. The obtained forecasting results were compared with the results of BackPropagation and GMDH by MSE and MAPE criteria.

For the experiments, the dataset was divided into training, validation, and test subsamples. The test subsample was used to evaluate the accuracy of the forecast and was always constant - the last 50 points of the dataset. The ratio of training and test data sets varied across the experiments and had the (10) (11) (12) (13) following values (in percentage): 60/40, 70/30, 80/20. The forecasting intervals also changed: shortterm (3, 5, 7 days) and middle-term (10, 20, 30 days).

The historical data of the Close indicator of the DAX P index (^GDAXI) [19] for the period 24.03.2024-20.03.2025 were used as data set. The dynamics of changes of the Close indicator is shown in Fig. 2.

The correlation coefficients for the values of the Close indicator were also calculated, and correlogram was built (Fig. 2).

Analysing the correlogram, the data is highly correlated, with a correlation coefficient of at least 0.8 for the 50th lag.

The first series of experiments was conducted on short-term intervals for the BackPropagation, GMDH, and HSCI-bagging models. For each of the models, the ratio train/test was changed at each interval, and the optimal parameters (number of inputs, number of layers and number of neurons on each layer) for the BackPropagation network were determined. The metrics of the best results are shown in Tables 1-3.

BackPropagation 979294.79 3.82 GMDH

Charts of short-term forecasts for HSCI-bagging models are shown in Fig. 4-6.

For the analysis of the forecasting and model evaluation results, comparative bar charts of MSE (Fig. 7) and MAPE (Fig. 8) were built.

The next series of experiments was conducted at middle-term intervals. The quality criteria for the obtained forecasts are shown in Tables 4-6 and charts of middle-term forecasts for HSCI-bagging models are shown in Fig. 9-11. 9.35 8.6

For the analysis of the forecasting results and model evaluation, comparative bar charts were constructed for two key error metrics: MSE and MAPE. The bar chart representing MSE is shown in Fig. 12, and the bar chart illustrating MAPE is shown in Fig. 13. Presenting the comparison of metrics in this way clearly demonstrates how different the predictions of different models are. 70/30 70/30

To visually represent the dependence of the MSE and MAPE criteria values on the forecasting interval, the respective charts were constructed (Fig. 14 and Fig. 15).

Based on the analysis of the experimental results, it can be said that the hybrid HSCI-bagging system is quite effective compared to BackPropagation and GMDH. Table 7 shows that the network requires a larger amount of training data to improve the forecast accuracy at middle-term intervals. Looking at Figs. 14 and 15, it can be said that the forecast accuracy decreases rapidly when          forecasting for an interval of 20 days or more. It is also worth noting that as the forecasting interval increases, the accuracy of the MAPE criterion decreases more slowly than the MSE criterion.

6. Conclusion

This article investigates the problem of short- and medium-term forecasting of stock indices, in particular the Close indicator of the DAX P index, using intelligent methods of different generations. A comparative experimental study of three models - BackPropagation, GMDH and HSCI-bagging has led to a number of important conclusions:

The hybrid computational intelligence system HSCI-bagging demonstrated the best forecasting results for all criteria (MSE, MAPE) at different intervals, which indicates its high generalization ability and resistance to changes in data.

The BackPropagation neural network showed the worst results, which may be due to high sensitivity to training parameters, lack of data, or local minima in the optimization process. The dependence of the forecasting quality on the size of the training sample was established: as the forecast horizon increases, the need for more historical data increases to ensure stable accuracy.

Hybrid approaches based on bagging demonstrate high adaptability and potential for integration with other intelligent methods, including ensemble structures.

Based on the results, the following recommendations for further development can be made: Integration of time series processing methods (e.g., wavelet transform, EDA, STL decomposition) with hybrid intelligent systems to improve data preprocessing and highlight hidden patterns.

Use of deep ensemble models based on LSTM, GRU, or Transformer architectures, which can be combined with GMDH or other evolutionary approaches within meta-models. Adaptive real-time optimization of HSCI system parameters using reinforcement learning or evolutionary strategies.

Development of multimodal models that combine numerical, textual, and graphical data (e.g., news, tweets, corporate reports) to build more contextually aware forecasts.

Incorporating risk-oriented metrics such as Value-at-Risk (VaR) or Conditional Value-at-Risk (CVaR) into the process of evaluating forecasting performance to increase the practical value of decisions in financial applications.

Thus, the further development of the forecasting financial indices lies in the development of adaptive, hybrid and multilevel models that combine high forecasting accuracy with scalability and practical applicability to real market conditions.

Declaration on Generative AI

The authors have not employed any Generative AI tools. [4] Ye. Bodyanskiy, I. Pliss, Adaptive generalized forecasting of multivariate stochastic signals, in:

Proc. Latvian Sign. Proc. Int. Conf., vol. 2, Riga, 1990, pp. 80–83. [5] Ye. V. Bodyanskiy, I. A. Rudneva, On one adaptive algorithm for detecting discords in random sequences, Autom. Remote Control 56 (1995) 1439–1443. [6] A. J. C. Sharkey, On combining artificial neural nets, Connect. Sci. 8 (1996) 299–313. [7] S. Hashem, Optimal linear combination of neural networks, Neural Networks 10 (1997) 599–614. [8] U. Naftaly, N. Intrator, D. Horn, Optimal ensemble averaging of neural networks, Network:

Comput. Neural Syst. 8 (1997) 283–296. [9] L. Breiman, Bagging Predictors, Techn. Report №421, Dept. of Statistics, Univ. of California,

Berkeley, CA, 1994, 19 p. [10] A. G. Ivakhnenko, V. G. Lapa, Cybernetic Forecasting Devices, Naukova Dumka, Kyiv, 1965, 216 p. (in Ukrainian). [11] A. G. Ivakhnenko, G. A. Ivakhnenko, J. A. Mueller, Self-organization of the neural networks with active neurons, Pattern Recogn. Image Anal. 4 (1994) 177–188. [12] J. Schmidhuber, Deep learning in neural networks: An overview, Neural Networks 61 (2015) 85– 117. [13] Y. Zaychenko, Ye. Bodyanskiy, O. Tyshchenko, O. Boiko, G. Hamidov, Hybrid GMDH-neurofuzzy system and its training scheme, Int. J. Inf. Theories Appl. 24 (2018) 156–172. [14] Y. Zaychenko, Ye. Bodyanskiy, O. Boiko, G. Hamidov, Evolving Hybrid GMDH NeuroFuzzy Network and Its Application, in: Proc. Int. Conf. IEEE-SAIC 2018, Kyiv, Ukraine, IASA, 8–11 Oct. 2018. [15] Ye. Bodyanskiy, N. Kulishova, Y. Zaychenko, G. Hamidov, Spline-Orthogonal Extended Neo

Fuzzy Neuron, in: Proc. Int. Conf. CISP–BMEI 2019. [16] Ye. Bodyanskiy, Y. Zaychenko, O. Boiko, G. Hamidov, A. Zelikman, The Hybrid GMDH-Neofuzzy Neural Network in Forecasting Problems in Financial Sphere, in: Advances in Intelligent Computing, Springer, 2020, vol. 1075, pp. 221–225. [17] Ye. Bodyanskiy, O. Vynokurova, I. Pliss, Hybrid GMDH-neural network of computational intelligence, in: Proc. 3rd Int. Workshop on Inductive Modeling, Krynica, Poland, 2009, pp. 100– 107. [18] Y. Bodyanskiy, O. Kuzmenko, H. Zaichenko, Y. Zaychenko, Application of Hybrid Neural Networks based on bagging and Group Method of Data Handling for forecasting, in: Proc. 2023 IEEE 18th Int. Conf. on Computer Science and Information Technologies (CSIT), Lviv, Ukraine, 2023, pp. 1–6. doi:10.1109/CSIT61576.2023.10324161. [19] Yahoo Finance, DAX P (^GDAXI) Stock Price, News, Quote and History, 21 Mar. 2025. URL: https://finance.yahoo.com/quote/%5EGDAXI/. Accessed: 21 Mar. 2025.

[1]

D. E.

Rumelhart ,

G. E.

Hinton ,

R. J.

Williams , Learning representations by backpropagating errors , Nature 323 ( 1986 ) 533 - 536 . doi: 10 .1038/323533a0.

[2]

Goodfellow ,

Bengio ,

Courville , Deep Learning, MIT Press, 2016 . URL: http://www.deeplearningbook.org.

[3]

L. K.

Hansen ,

Salomon , Neural network ensembles , IEEE Trans. Pattern Anal. Mach. Intell . 12 ( 1990 ) 993 - 1000 .