<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Irina Kalinina</string-name>
          <email>irina.kalinina1612@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Aleksandr Gozhyj</string-name>
          <email>alex.gozhyj@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Victor Gozhyi</string-name>
          <email>gozhyi.v@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sergii Shiyan</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Petro Mohyla Black Sea National University</institution>
          ,
          <addr-line>St. 68 Desantnykiv 10, Mykolaiv, 54000</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The article investigates an approach to improving the architectures of two-level heterogeneous ensembles of models for solving machine learning problems. An improved ensemble architecture is proposed. In which the boosting method is used at the first level of ensemble learning to gradually improve the solutions of the base models. At the second level, the stacking method is used to aggregate the solutions of the base models using a metamodel. The base models used were a model based on multiple linear regression, a decision tree model, a random forest model, a support vector model, a KNN model, a model based on an artificial neural network, and a multivariate adaptive regression spline model. These models are divided into two groups: undertrained and over trained. The experimental part of the study was carried out on solving the problem of predicting the electricity generation indicators of hybrid power plants based on environmental indicators. The use of the improved architecture of a two-level heterogeneous ensemble demonstrated an increase in forecast accuracy compared to other ensemble architects and solutions based on any of the base models. The proposed approach is effective in solving machine learning problems.</p>
      </abstract>
      <kwd-group>
        <kwd>two-level heterogeneous ensemble of models</kwd>
        <kwd>boosting</kwd>
        <kwd>bagging</kwd>
        <kwd>stacking</kwd>
        <kwd>machine learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>The use of ensemble models to solve machine learning problems has become a leading trend in
recent years. The ensemble approach allows combining several weak models to form a strong
model in order to improve the accuracy of solving machine learning problems. Combining
uncorrelated predictions obtained using different alternative base models can improve the
performance of the models and demonstrate a reduction in the overall error. Ensemble methods are
designed to reduce bias or variance by aggregating the forecast values of the base (weak) models.</p>
      <p>By combining the strengths of different base models, ensemble learning can identify more
complex patterns and compensate for the weaknesses of individual models. This factor has led to
increased accuracy of predictions with real-time data. Such ensemble methods can increase
accuracy, reduce errors, and provide more reliable predictions by combining the power of different
algorithms. This works especially well in complex cases where a single model would be too
difficult to detect all the significant patterns present in the data.</p>
      <p>Ensemble approaches are now being used in industries such as healthcare, finance, ecology, and
cybersecurity to solve complex problems. As these industries generate vast amounts of data,
requiring more sophisticated analysis than ever before, ensemble modeling has become essential to
driving innovation. For example, in healthcare, ensemble models can help combine analytical data
from multiple sources and models to identify disease risks, optimize treatment programs, and
accelerate drug discovery. Therefore, the more sources of information that are combined, the better
the outcome in many industries.</p>
      <p>Ensemble methods can be divided into two groups: homogeneous and heterogeneous
ensembles. Homogeneous ensembles are formed by aggregating models of the same type to solve
machine learning problems. Heterogeneous ensembles are of particular interest. They are formed
by aggregating predictions obtained using different types of models. Their application allows for a
significant and comprehensive reduction in the overall error in solving a machine learning
problem.</p>
      <p>Problem statement. To investigate the features of the process of building two-level
heterogeneous ensembles of models. To develop an improved architecture of heterogeneous
ensembles of models to reduce the overall error of solving machine learning problems. To
experimentally confirm the effectiveness of the improved architecture of a two-level heterogeneous
ensemble of models on the example of solving a forecasting problem.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Works</title>
      <p>
        Ensemble learning is the process of aggregating several different models that solve machine
learning problems to obtain results that are better than those obtained by the algorithms when
used independently. Ensemble methods are often used to obtain solutions to various machine
learning problems. First of all, these are classification and prediction problems [
        <xref ref-type="bibr" rid="ref1 ref2 ref3">1-3</xref>
        ].
      </p>
      <p>
        The main principles of ensemble learning are presented in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. To increase the efficiency of
solving machine learning problems, ensemble learning takes advantage of several base models. In
[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], the authors present results showing that using ensembles of models yields more accurate
results than individual machine learning models. The work also shows that ensemble classifiers
outperform individual ensemble learning models and are more reliable.
      </p>
      <p>
        Individual base models used in solving machine learning problems have high variance and high
bias, which affects the overall accuracy of predictions [
        <xref ref-type="bibr" rid="ref6 ref7">6,7</xref>
        ]. Finding a compromise between
variance and bias allows you to increase the accuracy of predictive solutions. The variance and bias
errors caused by individual machine learning models can be reduced using ensemble methods, for
example, bagging reduces variance without increasing bias, and boosting reduces bias [
        <xref ref-type="bibr" rid="ref8 ref9">8,9</xref>
        ]. The
boosting algorithm allows for successively improving the solutions of several alternative base
models. The specifics of its use are presented in detail in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <p>
        There are two types of approaches to ensemble learning: parallel fitting of several alternative
models and sequential improvement of the solutions of these methods. The parallel method trains
different base models separately and then combine their solutions, sometimes using a metamodel.
This approach is implemented by bagging and stacking methods. Bagging, in the form of a random
forest, is popular and is used in various projects [
        <xref ref-type="bibr" rid="ref8">8,11</xref>
        ]. Sequential ensemble models are trained
sequentially so that each model learns to correct the error made by the previous model. In [12], it
was determined that the accuracy of each base model is an important factor in the effectiveness of
ensemble learning. Any machine learning algorithm is considered effective only when it has an
effective generalization strategy on previously unstudied examples. Therefore, by combining the
capabilities of many base models and approaches, ensemble learning is used to improve both the
accuracy and the solution efficiency of the overall machine learning model.
      </p>
      <p>A comprehensive approach to reducing the overall error in machine learning problems is
implemented by using two-level heterogeneous ensembles. At one of the levels in such ensembles,
the stacking method is used [13]. Stacking uses parallel training of several alternative models to
aggregate the predictions of several members of the ensemble. The stacking procedure involves the
use of several level 0 models as baselines and the meta-training procedure, which is a strategy that
instructs another model to add predictions using the baseline models. In [13], a level 1 model is
used. The main idea behind stacking is that baseline (level 0) models are trained using a training
dataset. The input and output pairs of the new dataset are then used for training and meta-training
by combining their expected target labeling with real labels on the hidden data [14].</p>
      <p>Using the results of additional machine learning algorithms, meta-learning algorithms are
trained to produce predictions that are more accurate than those generated by other base classifiers
[15]. The stacking method is effective because it combines the advantages of many weak models to
provide a result that is superior to that produced by ensemble models. In this case, many base
algorithms are used, as well as an initial data set. This allows stacking to create unique models that
solve the prediction problem in a new way.</p>
      <p>The main difficulty in creating heterogeneous ensembles is determining the optimal way to
combine the predictions of different models in the ensemble. There are two ways to build a
heterogeneous ensemble. In the first method, a fixed number of different models are combined. The
second method is to create a group of models with different parameters, and then select the best
subset to include in the final ensemble.</p>
      <p>In [16], the issues of creating an adaptive heterogeneous ensemble for solving machine learning
problems are considered. In [17], the authors proposed a static heterogeneous ensemble
architecture. When creating an ensemble, 5 different basic classifiers are combined: the support
vector machine (SVM), the multilayer perceptron (MLP), logistic regression, the K-nearest
neighbors’ method, and a decision tree. The parameters and architecture of individual classifiers
are determined using 10-fold cross-validation. The proposed approach is effective in solving
classification problems.</p>
      <p>The authors of [18] proposed a combination of several optimized methods: deep neural
networks, SVM, ada-boost, and Gaussian processes. The ensemble is generated using a simple
sumof-classifiers rule. However, the problem of determining the number of classifiers of each type was
not considered. In addition, the optimal composition of the ensemble depends on the problem being
solved. In [16–18], a possible way to overcome this problem was also proposed. It consists of
creating a library of classifiers and then selecting a subset for the final ensemble.</p>
      <p>In [19], a library of 2000 different methods trained with a wide range of different parameters is
built. From this library of models, an iterative greedy selection algorithm is applied to build a final
ensemble. The procedure starts with an empty ensemble. Then, at each iteration, the model that
maximizes the performance metric is included in the ensemble until all models are in the library.
The ensemble with the best performance in the validation set is selected as the final combination.</p>
      <p>In [20], the authors propose a greedy selection method from a library consisting of 200
classifiers: 40 neural networks, 60 nearest-neighbor classifiers, 80 SVMs, and 20 decision trees. For
each type of classifier, a parameter grid was defined, and one model was trained for each node in
the grid. In this approach, the ensemble grows gradually, selecting one classifier from the library at
a time. At each step, the selection is made from the perspective of both individual accuracy and
complementarity with the rest of the classifiers in the ensemble. In the classification problems
studied, such heterogeneous ensembles turned out to be more accurate.</p>
      <p>In [21,22], a genetic algorithm was considered for selecting the optimal structure of a
heterogeneous ensemble from 20 different base models. These selection methods have been widely
applied to homogeneous ensembles. Examples are presented in [23, 24]. In [25], to build an efficient
heterogeneous combination, the authors remove the low-performing base elements so that only the
optimal classifiers remain in the ensemble. The efficiency of the classifier is determined by
measuring the area under the ROC curve. In another study [26], the authors used a differential
evolution algorithm to optimize the weights of various base models in a heterogeneous ensemble.</p>
      <p>Thus, the task of improving the architecture of a heterogeneous ensemble in order to improve
the quality of predictive solutions is complex and relevant. It requires new approaches and
additional research.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Material and methods</title>
      <p>
        One of the approaches to improving the quality of results when solving machine learning problems
is the use of ensemble learning. This approach allows you to reduce the errors of solving problems
by gradually reducing the bias and variance. Studies have shown that it is most effective to use a
multi-level heterogeneous combination of different ensemble methods based on bagging, boosting
and stacking [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>
        It is known that the error of machine learning algorithms consists of three components: noise,
bias, and variance [
        <xref ref-type="bibr" rid="ref1 ref3">1,3</xref>
        ]:

( ) = 
( ) + 
( ) + 
( ),
(1)
where  — is a systematic error that any machine learning algorithm is expected to make due
to, for example, the choice of model structure, insufficient amount of training data,
unrepresentative data for training the model;  measures the sensitivity of the algorithm
to a specific training set and/or selected hyperparameters;  — This is a random error in the
data that cannot be avoided, for example, due to data entry errors, previously corrupted input data,
etc.
      </p>
      <p>When choosing the best base model for a machine learning problem, it is necessary to pay
attention to obtaining a compromise between bias and variance because undertrained models have
a large bias value, and over trained models have a large variance value. These shortcomings lead to
an increase in the overall error. The technique of building a two-level architecture of the ensemble
of models, that is, aggregating solutions from different base models to create a single generalized
model with a smaller error value, is a technique that can find a compromise between the values of
bias and variance of individual base models. Thus, a two-level ensemble architecture can help
reduce both bias and variance, excluding the noise component of the error.</p>
      <p>When selecting a combination of ensemble methods and forming the architecture of a
heterogeneous ensemble, the following features should be considered:






combining the results of several base models reduces the risk of choosing an ineffective
(weak) model;
several uncorrelated models, grouped into an ensemble, provide a more accurate solution
than any of the individual base machine learning models;
ensemble methods tend to improve the generalized accuracy of only a set of individual
models, and all this only happens in a certain domain;
a set of base machine learning models with similar training results may have different
generalization results;
if the initial data set is too large, then one model may not cope with the solution of the
problem. In this case, it is necessary to train different base models on different data
samples;
if the initial data set is too small, then resampling methods should be used.</p>
      <p>Let us define the relationship between error, bias, and variance. Let us have random variables —
 ,  ,  , describing the distribution of values for instances x; their real f(x) and predicted values h(x).
The value h(X) is an estimate of the true function f(X), generated by some model  , but is
unknown.</p>
      <p>When constructing any process model, we assume that the observed values  ∈  are
generated by the function  ( ) plus a random normally distributed error,  :</p>
      <p>=  ( ) +  .</p>
      <p>The mean squared error that we expect for the entire data distribution is defined as follows:</p>
      <p>Let us define the components of error: bias and variance. Bias is the difference between the
mean values determined by the model and the actual values. Bias shows how much the “average”
model differs from the actual relationship between the variables (Fig. 1). It can be represented as:

(ℎ,  ) = ( [ℎ( )] − ()
) =  [ℎ( )] + ()
− 2 [ℎ( )] ( ).</p>
      <p>Variance is the expected variability of a model around its mean. Variance shows how much the
model changes, for example with different hyperparameters or data samples (Fig. 1). It can be
represented as:</p>
      <p>Substitute the defined expressions for the bias and dispersion into expression (2) and obtain the
following dependence:
 =</p>
      <p>[(ℎ( ) −  ( )) ] =  [(ℎ( ) +  [ℎ( )] −  [(ℎ( ) −  ( )]) ] = ⋯
… =  [ℎ( )] −  ( )
+ 
ℎ( ) −  ( )
=  
− 
+ 
 − 
+  =
= 
+</p>
      <p>+  .</p>
      <p>By combining several alternative and uncorrelated solutions to a machine learning problem into
an ensemble structure, the variance is usually reduced, and therefore the error rate is reduced.
(ℎ ( ,  ))) =
∑ 
(ℎ ( ,  ))

where ℎ (,</p>
      <p>) — the predicted value obtained using the i-th model for training from the ensemble
of models that is formed.
complex.</p>
      <p>The use of ensembles of models contributes to the fact that for most cases the variance of the
ensemble of models is usually lower than in the case of using a single base model, even if it is</p>
    </sec>
    <sec id="sec-4">
      <title>4. Experimental part</title>
      <sec id="sec-4-1">
        <title>4.1. Data analysis and pre-processing</title>
        <p>As an example of the application of the ensemble approach, the task of predicting the electricity
generation indicators of hybrid power plants based on environmental indicators [27] is considered.
The task is to predict the electricity generation of a combined cycle plant based on environmental
indicators collected using a system of sensors located near the power plant. Thus, the data set
formed on the basis of sensor indicators was obtained from actual observations. The data set
presents 9568 observations collected over 6 years (2006–2011). At the same time, full load in the
combined cycle mode was for 674 days.</p>
        <p>The result of the study is the amount of electricity generation (PE 420.26 − 495.76 MW), which
is taken as the target variable. The input data are ambient temperature (AT in the range from
1.81◦C to 37.11◦C), ambient pressure (AP in the range from 992.89 − 1033.30 millibars) and relative
humidity (RH in the range from 25.56% to 100.16%). The dataset consists of numerical features. The
data does not require additional processing. The structure of the dataset is presented in Figure 2.</p>
        <p>The data analysis and pre-processing stage includes the following procedures: identification and
processing of missing values, identification and processing of outliers and anomalous values,
identification of duplicates, checking for correlation in the data, normalization and feature
selection [28,29].</p>
        <p>After checking for missing values and identifying anomalies, a correlation check was performed
on the data. For this purpose, a correlation matrix and a scatter matrix were created. A graphical
version of the correlation matrix is shown in Fig. 3.</p>
        <p>Analysis of the correlation matrix shows:
1. Temperature and exhaust vacuum have a strong positive correlation of 0.84, which means
that an increase in temperature is usually accompanied by an increase in exhaust vacuum.
2. Temperature and output power have a strong negative correlation of -0.95, which means
that as the ambient temperature increases, the output power decreases.
3. Exhaust vacuum and power output have a strong negative correlation of -0.87, which
indicates that if the exhaust vacuum is high, the power output is very low, and vice versa.</p>
        <p>
          Duplicates (82 values) that were detected were removed. To increase the speed of training, data
normalization was performed using the min-max method. Normalization transformed the features
into the range [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ], which eliminated the differences between the scales [30].
        </p>
        <p>After the dataset was prepared and reduced to a single range, cross-validation was performed to
select the features to determine the best model with the corresponding important features. To
evaluate the accuracy of the model using cross-validation on the sample, the MSE calculation
method was used. First, the model characteristics were selected, then the errors were calculated for
each cross-validation block. The result of feature selection demonstrated that the best model
includes all 4 predictors. The errors of all models are very low, which is a good indicator.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Building and training basic regression models</title>
        <p>The first step in the modelling phase is to divide the dataset into training and testing. For the
training sample, 90% of the rows of the cleaned dataset were selected, which were intended for
training the basic predictive models. The remainder, which is 10% of the rows, was used for testing
the models. The basic predictive models considered were a regression model based on multiple
linear regression, a decision tree model, a random forest regression model, a support vector
regression model, a KNN regression model, a regression model based on an artificial neural
network, and a multidimensional adaptive regression spline model. For each model, a structure was
selected and parameters were found, at which the models had the best quality indicators of
predictions on the test data sample. Table 1 presents the values of the quality metrics after training
and testing each of the basic regression models.</p>
        <p>So, the basic models showed good results on this data set. The random forest model performed
best. The regression model based on the decision tree and the regression model based on the
artificial neural network showed the worst results.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Formation of a two-level ensemble architecture based on basic models</title>
        <p>Initial version. One effective approach to building heterogeneous ensemble architectures uses
boosting or stacking for undertrained models and bagging for over trained base models [31,32].
This approach is aimed at minimizing bias in models with low variance and high bias, as well as
reducing variance in models with high variance and low bias. This combination option was
implemented programmatically to evaluate the effectiveness of this approach in the problem of
regression forecasting the amount of electrical energy generated in one hour by a combined cycle
power plant.</p>
        <p>To form the initial variant of combining the basic regression models into a two-level
heterogeneous ensemble architecture, it was decided to combine the undertrained models at the
first level, and the over trained ones at the second level. To divide the basic models into two
groups, the bias and variance values for each basic model were analysed. Thus, a regression model
based on an artificial neural network and a decision tree were selected for the first level, and a
regression model based on multiple linear regression, a regression model based on KNN, a
regression model based on the support vector method, a regression model based on a random
forest, and a multidimensional adaptive regression spline model were selected for the second level.
The architecture of the initial variant of combining the basic regression models is presented by the
stacking method at the first level and the bagging method at the second level (Fig. 4). A control
scheme with cross-validation was used to configure the stacking. Based on the predictions of the
basic undertrained models, a metamodel is created using a generalized linear model. The results of
the ensemble learning from the first level, stacking, were used to predict on the test dataset. The
obtained predictions were added to the predictions of the base models of the second level, which
included the retrained models. The ensemble learning scheme based on bagging with the use of
cross-validation was programmatically configured.</p>
        <p>To assess the effectiveness of ensemble learning, the quality indicators of models and forecasts
were calculated: RMSE, MAE, MAPE and R2. The results of the initial version of combining the
basic models into a two-level heterogeneous ensemble architecture are presented in Table 2.</p>
        <sec id="sec-4-3-1">
          <title>Regression model based on ANN 1 Regression model based on DR</title>
          <p>Resulting stacking layer
Regression model based on MLR</p>
          <p>Regression model based on KNN
2 Regression model based on SVM</p>
          <p>Regression model based on RF
Multivariate adaptive regression
splines (MARS)
Resulting bagging layer</p>
          <p>At the first level, using stacking, it was possible to increase the accuracy of forecasts to 0.93,
which is 4% more compared to individual baseline models, such as ANN and decision tree (with R2
= 0.90 and 0.89).</p>
          <p>At the second level, bagging combined the forecasts of the first level of stacking and retrained
models, which allowed to achieve overall accuracy in forecasting. Contrary to this, a separate
baseline regression model based on random forest outperformed the results of bagging. The reason
for such results is the specificity of the task of regression forecasting the amount of electricity
generated in one hour by a combined cycle power plant and the features of the data set as a whole.</p>
          <p>Improved version. When forming an improved version of the ensemble architecture, an
unconventional approach was chosen, which involves the use of undertrained models at the first
level aggregated by the boosting method and over trained models at the second level aggregated by
the stacking method (Fig. 5). This decision is due to the hypothesis that undertrained models, due
to less specificity in data fitting, can more effectively capture general trends and reduce the risk of
overtraining.</p>
          <p>The boosting method used at the first level allows to increase the stability of a heterogeneous
ensemble by combining the results of several weakly correlated models. At the second level, the
retrained base models and the boosting results are aggregated using stacking. This provides more
accurate forecasting, because stacking uses the results of the initial forecasting for further training
of the models, which allows them to deepen their knowledge of the dependencies in the data.</p>
          <p>The obtained training results of the second layer, stacking, are used to predict the value of the
output variable on the test data set. To assess the accuracy of the obtained forecasts, the main
indicators such as RMSE, MAE, MAPE and the coefficient of determination R2 are calculated. The
results on the quality indicators of models and forecasts are summarized in Table 3.</p>
          <p>At the first level, where boosting was used, the prediction accuracy increased to 0.91, compared
to 0.89 and 0.90 for individual models (ANN and decision tree). At the second level, when stacking
was used, an improvement in performance was achieved compared to the results of individual base
models: R2 = 0.97, RMSE = 3.37, MAE = 2.44, MAPE = 0.52%, which confirms the high efficiency of
model combination. Thus, the improved two-level ensemble architecture exceeds the efficiency of
individual models in prediction accuracy.</p>
        </sec>
        <sec id="sec-4-3-2">
          <title>Regression model based on ANN 1 Regression model based on DR</title>
          <p>Resulting boosting layer
Regression model based on MLR</p>
          <p>Regression model based on KNN
2 Regression model based on SVM</p>
          <p>Regression model based on RF
Multivariate adaptive regression
splines (MARS)
Resulting stacking layer</p>
          <p>This variant of ensemble combination worked particularly effectively due to the specificity of
the data set and the nature of the dependencies in it, as well as the features of the regression
problem. The data set contains a variety of interdependencies between variables, which creates
complex but significant patterns that need to be properly detected and taken into account for
accurate construction of the regression model. Undertrained models, such as decision trees and
KNN, are not powerful enough to reveal these dependencies individually, but combining them
through boosting helps to better generalize the main, stable patterns in the data, without being
prone to overtraining.</p>
          <p>Over trained models, thanks to their processing algorithms, can better adapt to the complex
nonlinear dependencies that are present in the data set, but are prone to fitting to the training data.
Stacking helps to effectively combine their predictions, smoothing out the errors caused by
overtraining and increasing the generalization ability of the ensemble on the test data.</p>
          <p>Therefore, it is the specificity of the data, which has a combination of both stable and complex
nonlinear dependencies, as well as the requirements of the regression problem, that justify the
effectiveness of this combination of bagging for undertrained models and stacking for over trained
ones. This approach allows to reveal significant relationships in the data and ensures the accuracy
of predictions by better generalizing to new data.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions</title>
      <p>The paper investigated an approach to improving ensemble architectures for solving machine
learning problems. When choosing a combination of ensemble methods and forming a
heterogeneous ensemble architecture, features that affect the construction of ensembles were
identified. An improved ensemble architecture was developed. In which the boosting method is
used at the first level of ensemble learning to gradually improve the solutions of basic models. At
the second level, the stacking method is used to aggregate the solutions of basic models using a
metamodel. In the experimental part of the study, the problem of predicting electricity generation
indicators was solved. The basic models selected were a model based on multiple linear regression,
a decision tree model, a random forest model, a support vector model, a KNN model, a model based
on an artificial neural network, and a spline model of multivariate adaptive regression. These
models were divided into undertrained and over trained groups. On the models of the first group,
ensemble methods reduced the variance, and on the models of the second group, they reduced the
bias. Thus, the forecasting results were improved by reducing the overall error. This made it
possible to increase the forecast accuracy in the improved architecture of a two-level
heterogeneous ensemble compared to other ensemble architects and solutions based on any of the
basic models. The proposed approach is effective in solving machine learning problems.</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <sec id="sec-6-1">
        <title>The authors have not employed any Generative AI tools.</title>
        <p>[11] H. Liu, A. Gegov, M. Cocea, Ensemble learning approaches, in: Rule-Based Systems for Big
Data, vol. 13 of Studies in Big Data, Springer, Cham, 2016, pp. 63–73. doi:
10.1007/978-3-31923696-4_6.
[12] P. A. Flach, T. De Bie, N. Cristianini, Machine Learning and Knowledge Discovery in
Databases: European conference, ECML PKDD 2012, vol. 7523–7524 of Lecture Notes in
Artificial Intelligence, Springer, Berlin; New York, NY, 2012,
URL: http://hdl.handle.net/1854/LU-7009607
[13] O. Sagi, L. Rokach, Ensemble learning: A survey, Wiley interdisciplinary reviews: data mining
and knowledge discovery, vol. 8(4): e1249, (2018), doi: 10.1002/widm.1249.
[14] M. Liang, et al., A stacking ensemble learning framework for genomic prediction, Frontiers in
Genetics, sec. Statistical genetics and methodology, vol. 12 (2021) 600040,
doi: 10.3389/fgene.2021.600040.
[15] T. M. Hospedales, P. M. Antoniou, A. J. Storkey, Meta-learning in neural networks: A survey,
IEEE Transactions on Pattern Analysis and Machine Intelligence, pp (99):1-1 (2021).
doi: 10.1109/TPAMI.2021.3079209.
[16] Z. Lu, X. Wu, J. C. Bongard, Active learning through adaptive heterogeneous ensembling, IEEE
Transactions on Knowledge and Data, vol. 27, issue 2 (2015) pp. 368–381, doi:
10.1109/TKDE.2014.2304474.
[17] J. M. de Oliveira, E. M. dos Santos, J. R. H. Carvalho, L. A. de Vasconcelos Marques, Ensemble
of heterogeneous classifiers applied to lithofacies classification using logs from different wells,
in conference: Neural Networks (IJCNN), International Joint, Dallas, TX, USA, (2013) pp. 1–6,
doi: 10.1109/IJCNN.2013.6707013.
[18] L. Nanni, S. Brahnam, S. Ghidoni, A. Lumini, Toward a general-purpose heterogeneous
ensemble for pattern classification, computational intelligence and neuroscience, vol. 5 (2015)
pp. 1-10, doi: 10.1155/2015/909123.
[19] R. Caruana, A. Niculescu-Mizil, G. Crew, A. Ksikes, Ensemble selection from libraries of
models, in: proceedings of the 21 st International Conference on Machine Learning, Banff,
Canada, (ICML ’04), New York, USA, (2004) p. 18.
[20] I. Partalas, G. Tsoumakas, I. Vlahavas, An ensemble uncertainty aware measure for directed
hill climbing ensemble pruning, Machine Learning, vol. 81(3) (2010) pp. 257–282, doi:
10.1007/s10994-010-5172-0.
[21] M. N. Haque, N. Noman, R. Berretta, P. Moscato, Heterogeneous ensemble combination search
using genetic algorithm for class imbalanced data classification, PLoS One 11(1) (2016)
e0146116, doi: 10.1371/journal.pone.0146116.
[22] G. Tsoumakas, I. Katakis, I. Vlahavas, Effective voting of heterogeneous classifiers, in
conference: 15th European Conference on Machine Learning/8th, European conference on
volume: 3201, Springer, Berlin, (2004) pp. 465–476, doi: 10.1007/978-3-540-30115-8_43.
[23] G. Tsoumakas, I. Partalas, I. Vlahavas, An ensemble pruning primer, in book: Applications of
supervised and unsupervised ensemble methods, Springer, Berlin, (1970) pp. 1–13, doi:
10.1007/978-3-642-03999-7_1.
[24] G. Martínez-Muñoz, D. Hernández-Lobato, A. Suárez, An analysis of ensemble pruning
techniques based on ordered aggregation, IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. 31(2) (2009) pp. 245–259, doi: 10.1109/TPAMI.2008.78.
[25] E. Alshdaifat, M. Al-Hassan, A. Aloqaily, Effective heterogeneous ensemble classification: An
alternative approach for selecting base classifiers, ICT Express, South Korea (2020), ICT
Express 7(3), pp. 1-8, doi: 10.1016/j.icte.2020.11.005
[26] M. N. Haque, M. N. Noman, R. Berretta, P. Moscato, Optimising weights for heterogeneous
ensemble of classifiers with differential evolution, in: Proc. IEEE Congress on Evolutionary
Computation (CEC), Vancouver, BC, Canada. 2016, pp. 233–240, doi:
10.1109/CEC.2016.7743800.
[27] Machine learning repository [electronic resource], URL:
https://archive.ics.uci.edu/ml/datasets/combined+cycle+power+plant
[28] I. Kalinina, A. Gozhyj, V. Vysotska, E. Malakhov, V. Gozhyj, I. Tregubova, System
methodology of data analysis and preprocessing for solving classification problems,
conference: 2024 IEEE 19th International Conference on Computer Science and Information
Technologies (CSIT). Lviv, Ukraine (2024) doi:10.1109/CSIT65290.2024.10982630. URL:
https://ieeexplore.ieee.org/document/10982630
[29] I.Kalinina, P. Bidyuk, A. Gozhyj, V. Gozhyi, V. Nechakhin, Approach to identification of
anomalous values in analysis tasks and data pre-processing, in: Babichev, S., Lytvynenko, V.
(eds) Lecture Notes in Data Engineering, Computational Intelligence, and Decision-Making,
Volume 2. ISDMCI 2024. Lecture Notes on Data Engineering and Communications
Technologies, vol 244. (2025). Springer, Cham. doi: 10.1007/978-3-031-88483-2_6.
[30] I. Kalinina, A. Gozhyj, P. Bidyuk, V. Gozhyi, M. Korobchynskyi, V. Nadraga, A systematic
approach to data normalization and standardization in machine learning problems, in:
Babichev, S., Lytvynenko, V. (eds) Lecture Notes in Data Engineering, Computational
Intelligence, and Decision-Making, Volume 2. ISDMCI 2024. Lecture Notes on Data
Engineering and Communications Technologies, (2025), vol 244. Springer, Cham.
https://doi.org/10.1007/978-3-031-88483-2_11.
[31] I. Kalinina, A. Gozhyj, P. Bidyuk, V. Gozhyj, Multilevel ensemble approach in classification
problems, conference: 2024 IEEE 19th International Conference on Computer Science and
Information Technologies (CSIT). Lviv, Ukraine (2024) doi: 10.1109/CSIT65290.2024.10982625,
URL: https://ieeexplore.ieee.org/document/10982625.
[32] P. Bidyuk , I. Kalinina, O. Zhebko, A. Gozhyj, T. Hannichenko, Classification system based on
ensemble methods for solving machine learning tasks, CEUR- WS. (2023), vol. 3426. Pp. 1-11.
CEUR-WS.org/Vol-3426/paper5.pdf. (ISSN 1613-0073).</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>G.</given-names>
            <surname>Kunapuli</surname>
          </string-name>
          ,
          <article-title>Ensemble Methods for Machine Learning</article-title>
          ,
          <source>United States of America: Manning</source>
          ,
          <string-name>
            <surname>Ebook</surname>
          </string-name>
          (
          <year>2023</year>
          ), p.
          <fpage>352</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jain</surname>
          </string-name>
          ,
          <article-title>Ensemble Learning for AI Developers: Learn Bagging, Stacking, and Boosting Methods with Use Cases</article-title>
          , Germany: Apress, E-book (
          <year>2020</year>
          ), p.
          <fpage>136</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>L.</given-names>
            <surname>Rokach</surname>
          </string-name>
          ,
          <article-title>Ensemble Learning: Ensemble Learning: Pattern Classification Using Ensemble Methods (Second Edition)</article-title>
          , Singapore: World Scientific Publishing Company, E-book (
          <year>2019</year>
          ), p.
          <fpage>300</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>I. D.</given-names>
            <surname>Mienye</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <article-title>A survey of ensemble learning: Concepts, algorithms, applications, and prospects</article-title>
          , IEEE Access,
          <volume>10</volume>
          (
          <year>2022</year>
          ) pp.
          <fpage>99129</fpage>
          -
          <lpage>99149</lpage>
          . doi:
          <volume>10</volume>
          .1109/ACCESS.
          <year>2022</year>
          .
          <volume>3207287</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Langford</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. D.</given-names>
            <surname>Smith</surname>
          </string-name>
          ,
          <string-name>
            <surname>T. J. Brown</surname>
          </string-name>
          ,
          <article-title>Ensemble learning</article-title>
          , in: C. Sammut,
          <string-name>
            <surname>G. I.</surname>
          </string-name>
          Webb (Eds.),
          <source>Encyclopedia of Machine Learning</source>
          , Springer, Boston, MA, (
          <year>2011</year>
          ) pp.
          <fpage>312</fpage>
          -
          <lpage>320</lpage>
          , doi: 10.1007/978-0-
          <fpage>387</fpage>
          -30164-8_
          <fpage>252</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Mishra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sharma</surname>
          </string-name>
          ,
          <article-title>Improving the accuracy of ensemble machine learning classification models using a novel bit-fusion algorithm for healthcare AI systems</article-title>
          , Frontiers in Public Health,
          <volume>10</volume>
          (
          <year>2022</year>
          ) 858282, doi: 10.3389/fpubh.
          <year>2022</year>
          .
          <volume>858282</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Zhang,</surname>
          </string-name>
          <article-title>Classifier selection and ensemble model for multiclass imbalance learning in education grants prediction</article-title>
          ,
          <source>Applied Artificial Intelligence</source>
          , vol.
          <volume>35</volume>
          (
          <issue>3</issue>
          ) (
          <year>2021</year>
          ) pp.
          <fpage>1</fpage>
          -
          <lpage>14</lpage>
          , doi: 10.1080/08839514.
          <year>2021</year>
          .
          <volume>1877481</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Doroudi</surname>
          </string-name>
          ,
          <article-title>The bias-variance tradeoff: How data science can inform educational debates</article-title>
          ,
          <source>AERA Open</source>
          , vol.
          <volume>6</volume>
          (
          <issue>4</issue>
          ) (
          <year>2020</year>
          ) 233285842097720, doi: 10.1177/2332858420977208.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S.</given-names>
            <surname>Alelyani</surname>
          </string-name>
          ,
          <article-title>Stable bagging feature selection on medical data</article-title>
          ,
          <source>Journal of Big Data</source>
          , vol.
          <volume>8</volume>
          (
          <issue>1</issue>
          ) (
          <year>2021</year>
          )
          <article-title>11</article-title>
          . doi:
          <volume>10</volume>
          .1186/s40537-020-00385-8.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>T.</given-names>
            <surname>Ravichandran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Suresh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <article-title>Ensemble-based machine learning approach for improved leak detection in water mains</article-title>
          ,
          <source>Journal of Hydroinformatics</source>
          , vol.
          <volume>23</volume>
          (
          <issue>2</issue>
          ) (
          <year>2021</year>
          )
          <fpage>307</fpage>
          -
          <lpage>323</lpage>
          . doi:
          <volume>10</volume>
          .2166/hydro.
          <year>2021</year>
          .
          <volume>093</volume>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>