<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Y. Bodyanskiy);</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Investigation of the effectiveness of the Hybrid System of Computational Intelligence based on bagging and Group Method of Data Handling in the task of stock index forecasting⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Yevgeniy Bodyanskiy</string-name>
          <email>yevgeniy.bodyanskiy@nure.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yuriy Zaychenko</string-name>
          <email>zaychenkoyuri@ukr.net</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Oleksii Kuzmenko</string-name>
          <email>oleksii.kuzmenko@ukr.net</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Helen Zaichenko</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Artificial Intelligence Department, Kharkiv National University of Radio Electronics</institution>
          ,
          <addr-line>Nauky Avenue 14, Kharkiv, 61166</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Institute for Applied System Analysis, Igor Sikorsky Kyiv Polytechnic Institute</institution>
          ,
          <addr-line>Beresteiskyi Avenue 37-A, Kyiv, 03056</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0001</lpage>
      <abstract>
        <p>The article considers intelligent methods of different generations for solving the problem of short- and medium-term forecasting of stock indices. The effectiveness of a fully connected network (BackPropagation), Group Method of Data Handling (GMDH), and Hybrid System of Computational Intelligence based on bagging and Group Method of Data Handling (HSCI-bagging) is considered. The influence of experimental parameters on the MSE and MAPE forecasting quality criteria is investigated. The obtained results are analyzed and the optimal parameters for different forecasting intervals are determined. On the basis of the optimal parameters found, the expediency of using the hybrid HSCI-bagging system in the forecasting task at different intervals is substantiated.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Hybrid System of Computational Intelligence</kwd>
        <kwd>bagging</kwd>
        <kwd>GMDH</kwd>
        <kwd>BackPropagation</kwd>
        <kwd>forecasting 1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The progress of intelligent methods has evolved in several stages and has defined a generation of
artificial neural networks. The emergence of each new generation of artificial neural networks is
facilitated by new challenges, such as the complexity of the tasks they are supposed to solve. The first
generation of artificial neural networks includes the perceptron, developed in 1957 by Frank
Rosenblatt. It is an early development of artificial intelligence and is considered the simplest type of
artificial neural network. A significant limitation of this generation of neural networks is that they
can only solve linearly separable problems. The second generation made it possible to solve the XOR
task. The main difference between this generation of networks is their multilayer architecture, and
the most famous example is the BackPropagation neural network proposed by Rumelhart, Hinton,
and Williams in 1986 [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The next generation is characterized by an architecture with a large number
of layers (Deep Neural Networks) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. It was the third generation that allowed for a significant
breakthrough in many areas of artificial intelligence. It is worth noting that its emergence was also
facilitated by the improvement of hardware and the development of data storage and processing
technologies. This is very important, as this generation of networks requires significant
computational costs and large samples of training data. The emergence of the fourth generation was
facilitated by the complexity of artificial intelligence tasks. A feature of this generation of networks is
the ability to perform several different tasks using a single architecture. For this purpose, new
architectures and methods have been created: large language models (LLMs), generative models,
hybrid approaches to model building, etc.
      </p>
      <p>Thus, the latest generation emphasizes the possibility of multitasking and versatility but also
requires optimization of the complexity of the models, since the accuracy of the results depends on
the computational costs.
Deep learning neural networks play an important role in modern approaches to solving problems of
prediction, classification, image processing, and natural language text analysis. Their universal
approximation properties make them indispensable in many fields. However, the practical
application of such powerful tools requires large amounts of training data, long time for model
training, and significant computational costs. These are the key problems that can hinder the use of
this type of neural network to solve complex problems and, consequently, obtain high accuracy
results.</p>
      <p>It is especially difficult to use deep neural networks when it is necessary to process non-stationary
data streams transmitted in real time. In this case, radial basis function networks, Wang-Mendel,
Takagi-Sugeno, or probabilistic neural networks can be an alternative. However, compared to deep
neural networks, they have two significant drawbacks: lower accuracy of results and empirical
selection of system parameters.</p>
      <p>
        The above difficulties can be overcome by using an ensemble approach [
        <xref ref-type="bibr" rid="ref3">3-8</xref>
        ] to combine the work
of models of different levels of complexity – from the first to the fourth generation. Ensembles allow
the simultaneous use of different systems to solve a single problem in parallel in order to combine the
output results. The use of the Bagging procedure [9] minimizes the error on the training set, even
when the data is received online. The problem still remains the complexity of the system architecture,
and thus the need for significant computational costs.
      </p>
      <p>To simplify the ensemble model, we can apply Group Method of Data Handling (GMDH) [10, 11]
to decompose the system and determine its optimal architecture. This method, called the “first deep
learning method” by J. Schmidhuber [12], was used in the 1970s to create multilevel networks. It is
worth noting that the advantage of using GMDH is not only to simplify the complexity of the system,
but also to increase the accuracy of its results [13-17].</p>
      <p>Therefore, we consider it expedient to investigate the proposed [18] Hybrid System of
Computational Intelligence based on bagging and Group Method of Data Handling (HSCI-bagging).
The aim of the investigation is to determine the optimal parameters of the hybrid system and the
corresponding efficiency in the task of short- and middle-term stock index forecasting.</p>
    </sec>
    <sec id="sec-2">
      <title>3. Overview of HSCI architecture based on Bagging and GMDH</title>
      <p>In Fig. 1 the architecture of the proposed system is presented.</p>
      <p>The architecture of the system contains 2S sequentially connected stacks, while odd stacks are
formed by ensembles of parallel-connected subsystems that solve the same problem (recognition,
prediction, etc.) and even ones are essentially learning metamodels that generalize the output signals
of ensembles and form optimal results in the sense of the accepted criterion. The output signal of the
first metamodel is the generalized optimal signal y¿1( k ) and ( n−1 ) output signals
^y[i1μ] ( k ) , i=1,2. .. , n−1 ( k ) "best members of the ensemble". At their core, metamodels function as
selection units in traditional GMDH systems, but not only select the best results from the previous
stack, but also form the optimal solution based on these results.</p>
      <p>
        Further, the output signals of the first metamodel are fed to the inputs of the second ensemble,
which is completely similar to the first. The outputs of the second ensemble
^y[12] ( k ) , ^y[22] ( k ) , ... , ^y[q2] ( k ) come to the second metamodel, which calculates the optimal signal
y¿[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]( k ) and ( n−1) ^y[i2μ] ( k ) "closest" to it. The last S-th ensemble is similar to the first two, and the
output of the last S-th metamodel is y¿[s]( k ), which exactly corresponds to a priori established
requirements for the quality of solving the problem under consideration.
      </p>
      <p>Each of the ensembles contains q different computational intelligence systems that solve the same
problem. There may still be simple neural networks such as a single-layer perceptron, radial-basis
neural network (RBFN), counterpropagating neural network, etc., which do not use error
backpropagation procedure for training, neuro-fuzzy systems such as ANFIS, Wang-Mendel or
Takagi-Sugeno-Kang type, wavelet-neuro systems, neo-fuzzy neurons and others, the output signal
of which linearly depends on the adapted parameters, which allows to use optimal speed learning
algorithms.</p>
    </sec>
    <sec id="sec-3">
      <title>4. HSCI-bagging learning algorithm</title>
      <p>The input information, on the basis of which the system is configured, is a training selection of input
signals:</p>
      <p>
        x (1) , x (2) , … , x ( k ) , … , x ( N ) ;
x ( k )=( x1 ( k ) , … , xi ( k ) , … , xn ( k ))T ∈ Rn
(1)
and its corresponding scalar refence signals y (1) , ... , y ( k ) , ... , y ( N ). On the basis of these
observations, the elements of the first ensemble are tuned independently of each other, at the outputs
of which q scalar signals ^y[p2] ( k ) , p=1,2 , ... , q, are formed, which are conveniently represented in
the form of a vector ^y[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] ( k )=( ^y[11] ( k ) , … , ^y[p1] ( k ) , … , ^y[q1] ( k ) )T. These signals are sent to the inputs of
the first metamodel, at the outputs of which n sequences ^y¿[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] ( k ) , ^y[11μ] ( k ) , … , ^y[iμ1] ( k ) , … , ^y[n1−]1, μ ( k )
the main of which is y¿[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] ( k ) while others are auxiliary. The main signal of the metamodel y¿[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] ( k ) is
the union of the outputs of all members of the ensemble in the form of:
      </p>
      <p>
        q
y¿[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] ( k )=∑ w¿p[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] ^y[p1] ( k )= ^y[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]T ( k ) w¿[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] (2)
      </p>
      <p>
        p=1
where w¿[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]=( w¿[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] , ... , w¿p[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] ... w¿[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] )T – is a vector of adapted parameters-synaptic weights on which
q
additionally restrictions are set on unbiasedness:
      </p>
      <p>
        q
∑ w¿p[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] ¿ I Tq w¿[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]=1 , (3)
p=1
where I q – ( q × 1 ) – is the vector of unities.
      </p>
      <p>The problem of teaching the first metamodel is reduced to minimizing the standard quadratic
criterion in the presence of additional constraints (3).</p>
      <p>Thus, the problem of training the first metamodel can be solved using the standard method of
penalty functions, which in this case reduces to minimizing the expression:</p>
      <p>
        J ( w¿[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] , δ )=( Y ( N )−Y [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]( N ) w¿[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] )T ( Y ( N )−Y [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]( N ) w¿[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] )+ δ−2( I Tq w¿[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]−1 )
where Y ( N )=( y ( 1 ) , … , y ( k ) , … , y ( N ))T −( N × 1) is a
Y 1( N )=( ^y[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]( 1 ) , … , ^y[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]( k ) , … , ^y[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]( N ))T −( N × q ) is a matrix, δ is the penalty coefficient.
(4)
vector,
Minimization (4) by w¿[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] leads to the result:
where w LS [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] is a standard LSM estimate:
      </p>
      <p>
        w LS [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]=( Y [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]T ( N ) Y [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] ( N ) )+¿Y [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]T (N )Y (N )=P[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]( N )Y [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]T ( N )Y ( N ).¿
      </p>
      <p>It was proved [4, 5] that the use of score (5) leads to results that are not inferior in accuracy to the
best of the members of the first ensemble.</p>
      <p>If observations from the training sample are processed sequentially online, it is advisable to use
the least squares recurrent method in the form:</p>
      <p>
        P[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] ( k ) ^y[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] ( k +1) ^y[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]T ( k +1) P[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] ( k )
{ w LS [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] ( k +1)=w LS [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] ( k )+ P[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]( k +1 )( y ( k +1 )− ^y[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]T ( k +1 ) w LS [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]( k ) ^y[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]( k +1 ) ,
P[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] ( k +1)= P[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] ( k )− ,
      </p>
      <p>
        1+ ^y[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]T ( k +1) P[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] ( k ) ^y[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] ( k +1)
w[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] ( k +1)=w LS [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] ( k +1)+ P[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]( k +1 )( I Tq P[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]( k +1 ) I q )−1 ×( 1− I Tq w LS [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] ( k +1) I q ,
w¿p (0)=q−1 p=1 , 2 , ... , q .
(5)
(6)
(7)
(8)
or if a training sample is non-stationary we may use exponentially weighted recurrent LSM method:
¿ w¿[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] ( k +1)=w LS [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] ( k +1)+ P[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]( k +1 )( I Tq P[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]( k +1 ) I q )−1 ×( 1− I Tq w LS [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]( k +1 ) I q ,
w¿p (0)=q−1 p=1 , 2 , ... , q .
where 0&lt; α ≤ 1 – forgetting factor.
      </p>
      <p>
        To the parameters of the metamodel can be given meaning the levels of fuzzy membership to the
optimal output signal by introducing additional restrictions on the non-negative values of these
parameters, that is, in addition to the configurable parametersw¿[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] we can also calculate the levels of
this membership μ[p1] ≥ 0 , p=1,2 , ... , q.
      </p>
      <p>To do this, we introduce into consideration the extended Lagrange function:</p>
      <p>
        L ( μ[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] , , ρ )=( Y ( N )−Y [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]( N ) μ[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] )T ( Y ( N )−Y [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]( N ) μ[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] )+( I Tq μ[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]−1 )− ρT μ[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] ,
(9)
where ρ−( q × 1 ) – is vector of non-negative indefinite Lagrange multipliers.
      </p>
      <p>
        Using the equations system by Kuhn-Tucker:
{V μ[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] L( μ[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] , , ρ )=0⃗ ,
      </p>
      <p>
        ∂ L( μ[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] , , ρ )/ ∂=0 ,
      </p>
      <sec id="sec-3-1">
        <title>It’s not difficult to get the solution in the form:</title>
        <p>
          μ[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]= p[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]( N )( ^yT ( N )Y ( N )−0 , 5 I q+0 , 5 ρ ) ,
{¿ I Tq p[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]( N ) ^yT ( N )Y ( N )−1+0 , 5 I Tq p[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]( N ) ρ
0 , 5 I Tq p[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]( N ) I q
.
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>For finding vector of non-negative Lagrange</title>
        <p>Arrow-Hurwitz-Uzawa procedure:
multipliers, it’s reasonable to apply
where Pr+¿(.)¿ is a projector to positive ortant, ❑ρ ( k ) – learning rate, parameter.</p>
        <p>
          First expression (12) after non-complex transformations may be presented in a more compact
form:
μ[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]( k +1 )=w¿[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]( k +1 )− P[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]( k +1 )
0 , 5 I Tq P[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]( k +1 ) ρ ( k )
        </p>
        <p>I Tq P[' ]( k +1 ) I q</p>
        <p>
          +0 , 5 P[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]( k +1 ) ρ ( k )=¿
¿ w¿[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]( k +1 )+0 , 5(I q q−
        </p>
        <p>
          P[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]( k +1 ) I q I q
        </p>
        <p>
          T
I Tq P[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]( k +1 ) I q
)× P[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] ( k +1) ρ ( k ) ,
where instead of least squares estimates w LS [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]( k +1 ), the parameters of the metamodel
w¿[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]( k +1 ) are used, which simplifies the process of configuring it.
        </p>
        <p>
          As a result of learning the first metamodel, the optimal signal y¿[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] ( k ) is formed at its output, as
well as q signals ^y[p1,]μ ( k ) from which we choose n−1( if q ≥ n ) with the highest levels of fuzzy
membership μ[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], which subsequently in the form of ( n × 1 ) – vector are fed to the input of the second
p
ensemble, the outputs of which go to the inputs of the second metamodel, and so on. The process of
increasing the number of ensembles and metamodels continues until the required accuracy of the last
metamodel with the output y¿[s] ( k ) is achieved, or the value of the criterion minimized for the
bagging model begins to increase, i.e. ε2 ( y¿[s+1] ( k )) ≥ ε2 ( y¿[s] ( k )).
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>5. Experimental investigations</title>
      <p>The dependence of the prediction accuracy of the hybrid HSCI-bagging system on the parameters in
the prediction task was investigated. The obtained forecasting results were compared with the results
of BackPropagation and GMDH by MSE and MAPE criteria.</p>
      <p>For the experiments, the dataset was divided into training, validation, and test subsamples. The
test subsample was used to evaluate the accuracy of the forecast and was always constant - the last 50
points of the dataset. The ratio of training and test data sets varied across the experiments and had the
(10)
(11)
(12)
(13)
following values (in percentage): 60/40, 70/30, 80/20. The forecasting intervals also changed:
shortterm (3, 5, 7 days) and middle-term (10, 20, 30 days).</p>
      <p>The historical data of the Close indicator of the DAX P index (^GDAXI) [19] for the period
24.03.2024-20.03.2025 were used as data set. The dynamics of changes of the Close indicator is shown
in Fig. 2.</p>
      <p>The correlation coefficients for the values of the Close indicator were also calculated, and
correlogram was built (Fig. 2).</p>
      <p>Analysing the correlogram, the data is highly correlated, with a correlation coefficient of at least
0.8 for the 50th lag.</p>
      <p>The first series of experiments was conducted on short-term intervals for the BackPropagation,
GMDH, and HSCI-bagging models. For each of the models, the ratio train/test was changed at each
interval, and the optimal parameters (number of inputs, number of layers and number of neurons on
each layer) for the BackPropagation network were determined. The metrics of the best results are
shown in Tables 1-3.</p>
      <sec id="sec-4-1">
        <title>BackPropagation 979294.79 3.82</title>
      </sec>
      <sec id="sec-4-2">
        <title>GMDH</title>
        <p>Charts of short-term forecasts for HSCI-bagging models are shown in Fig. 4-6.</p>
        <p>For the analysis of the forecasting and model evaluation results, comparative bar charts of MSE
(Fig. 7) and MAPE (Fig. 8) were built.</p>
        <p>The next series of experiments was conducted at middle-term intervals. The quality criteria for
the obtained forecasts are shown in Tables 4-6 and charts of middle-term forecasts for HSCI-bagging
models are shown in Fig. 9-11.
9.35
8.6</p>
        <p>For the analysis of the forecasting results and model evaluation, comparative bar charts were
constructed for two key error metrics: MSE and MAPE. The bar chart representing MSE is shown in
Fig. 12, and the bar chart illustrating MAPE is shown in Fig. 13. Presenting the comparison of metrics
in this way clearly demonstrates how different the predictions of different models are.
70/30
70/30</p>
        <p>To visually represent the dependence of the MSE and MAPE criteria values on the forecasting
interval, the respective charts were constructed (Fig. 14 and Fig. 15).</p>
        <p>Based on the analysis of the experimental results, it can be said that the hybrid HSCI-bagging
system is quite effective compared to BackPropagation and GMDH. Table 7 shows that the network
requires a larger amount of training data to improve the forecast accuracy at middle-term intervals.
Looking at Figs. 14 and 15, it can be said that the forecast accuracy decreases rapidly when









forecasting for an interval of 20 days or more. It is also worth noting that as the forecasting interval
increases, the accuracy of the MAPE criterion decreases more slowly than the MSE criterion.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>6. Conclusion</title>
      <p>This article investigates the problem of short- and medium-term forecasting of stock indices, in
particular the Close indicator of the DAX P index, using intelligent methods of different generations.
A comparative experimental study of three models - BackPropagation, GMDH and HSCI-bagging
has led to a number of important conclusions:</p>
      <p>The hybrid computational intelligence system HSCI-bagging demonstrated the best
forecasting results for all criteria (MSE, MAPE) at different intervals, which indicates its high
generalization ability and resistance to changes in data.</p>
      <p>The BackPropagation neural network showed the worst results, which may be due to high
sensitivity to training parameters, lack of data, or local minima in the optimization process.
The dependence of the forecasting quality on the size of the training sample was established:
as the forecast horizon increases, the need for more historical data increases to ensure stable
accuracy.</p>
      <p>Hybrid approaches based on bagging demonstrate high adaptability and potential for
integration with other intelligent methods, including ensemble structures.</p>
      <p>Based on the results, the following recommendations for further development can be made:
Integration of time series processing methods (e.g., wavelet transform, EDA, STL
decomposition) with hybrid intelligent systems to improve data preprocessing and highlight
hidden patterns.</p>
      <p>Use of deep ensemble models based on LSTM, GRU, or Transformer architectures, which can
be combined with GMDH or other evolutionary approaches within meta-models.
Adaptive real-time optimization of HSCI system parameters using reinforcement learning or
evolutionary strategies.</p>
      <p>Development of multimodal models that combine numerical, textual, and graphical data (e.g.,
news, tweets, corporate reports) to build more contextually aware forecasts.</p>
      <p>Incorporating risk-oriented metrics such as Value-at-Risk (VaR) or Conditional Value-at-Risk
(CVaR) into the process of evaluating forecasting performance to increase the practical value
of decisions in financial applications.</p>
      <p>Thus, the further development of the forecasting financial indices lies in the development of
adaptive, hybrid and multilevel models that combine high forecasting accuracy with scalability and
practical applicability to real market conditions.</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <p>The authors have not employed any Generative AI tools.
[4] Ye. Bodyanskiy, I. Pliss, Adaptive generalized forecasting of multivariate stochastic signals, in:</p>
      <p>Proc. Latvian Sign. Proc. Int. Conf., vol. 2, Riga, 1990, pp. 80–83.
[5] Ye. V. Bodyanskiy, I. A. Rudneva, On one adaptive algorithm for detecting discords in random
sequences, Autom. Remote Control 56 (1995) 1439–1443.
[6] A. J. C. Sharkey, On combining artificial neural nets, Connect. Sci. 8 (1996) 299–313.
[7] S. Hashem, Optimal linear combination of neural networks, Neural Networks 10 (1997) 599–614.
[8] U. Naftaly, N. Intrator, D. Horn, Optimal ensemble averaging of neural networks, Network:</p>
      <p>Comput. Neural Syst. 8 (1997) 283–296.
[9] L. Breiman, Bagging Predictors, Techn. Report №421, Dept. of Statistics, Univ. of California,</p>
      <p>Berkeley, CA, 1994, 19 p.
[10] A. G. Ivakhnenko, V. G. Lapa, Cybernetic Forecasting Devices, Naukova Dumka, Kyiv, 1965, 216
p. (in Ukrainian).
[11] A. G. Ivakhnenko, G. A. Ivakhnenko, J. A. Mueller, Self-organization of the neural networks with
active neurons, Pattern Recogn. Image Anal. 4 (1994) 177–188.
[12] J. Schmidhuber, Deep learning in neural networks: An overview, Neural Networks 61 (2015) 85–
117.
[13] Y. Zaychenko, Ye. Bodyanskiy, O. Tyshchenko, O. Boiko, G. Hamidov, Hybrid
GMDH-neurofuzzy system and its training scheme, Int. J. Inf. Theories Appl. 24 (2018) 156–172.
[14] Y. Zaychenko, Ye. Bodyanskiy, O. Boiko, G. Hamidov, Evolving Hybrid GMDH NeuroFuzzy
Network and Its Application, in: Proc. Int. Conf. IEEE-SAIC 2018, Kyiv, Ukraine, IASA, 8–11 Oct.
2018.
[15] Ye. Bodyanskiy, N. Kulishova, Y. Zaychenko, G. Hamidov, Spline-Orthogonal Extended
Neo</p>
      <p>Fuzzy Neuron, in: Proc. Int. Conf. CISP–BMEI 2019.
[16] Ye. Bodyanskiy, Y. Zaychenko, O. Boiko, G. Hamidov, A. Zelikman, The Hybrid
GMDH-Neofuzzy Neural Network in Forecasting Problems in Financial Sphere, in: Advances in Intelligent
Computing, Springer, 2020, vol. 1075, pp. 221–225.
[17] Ye. Bodyanskiy, O. Vynokurova, I. Pliss, Hybrid GMDH-neural network of computational
intelligence, in: Proc. 3rd Int. Workshop on Inductive Modeling, Krynica, Poland, 2009, pp. 100–
107.
[18] Y. Bodyanskiy, O. Kuzmenko, H. Zaichenko, Y. Zaychenko, Application of Hybrid Neural
Networks based on bagging and Group Method of Data Handling for forecasting, in: Proc. 2023
IEEE 18th Int. Conf. on Computer Science and Information Technologies (CSIT), Lviv, Ukraine,
2023, pp. 1–6. doi:10.1109/CSIT61576.2023.10324161.
[19] Yahoo Finance, DAX P (^GDAXI) Stock Price, News, Quote and History, 21 Mar. 2025. URL:
https://finance.yahoo.com/quote/%5EGDAXI/. Accessed: 21 Mar. 2025.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>D. E.</given-names>
            <surname>Rumelhart</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. E.</given-names>
            <surname>Hinton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. J.</given-names>
            <surname>Williams</surname>
          </string-name>
          ,
          <article-title>Learning representations by backpropagating errors</article-title>
          ,
          <source>Nature</source>
          <volume>323</volume>
          (
          <year>1986</year>
          )
          <fpage>533</fpage>
          -
          <lpage>536</lpage>
          . doi:
          <volume>10</volume>
          .1038/323533a0.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>I.</given-names>
            <surname>Goodfellow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bengio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Courville</surname>
          </string-name>
          , Deep Learning, MIT Press,
          <year>2016</year>
          . URL: http://www.deeplearningbook.org.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>L. K.</given-names>
            <surname>Hansen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Salomon</surname>
          </string-name>
          ,
          <article-title>Neural network ensembles</article-title>
          ,
          <source>IEEE Trans. Pattern Anal. Mach. Intell</source>
          .
          <volume>12</volume>
          (
          <year>1990</year>
          )
          <fpage>993</fpage>
          -
          <lpage>1000</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>