<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>The Eighth International Workshop on Computer Modeling and Intelligent Systems, May</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Optimisation of Training Samples with KLE and Mutual Information</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Denys Symonov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Oleksandr Palagin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yehor Symonov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bohdan Zaika</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>V.M. Glushkov Institute of Cybernetics of the National Academy of Sciences (NAS) of Ukraine</institution>
          ,
          <addr-line>Akademika Glushkova Avenue 40, Kyiv, 03187</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>5</volume>
      <issue>2025</issue>
      <fpage>0000</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>One of the key challenges in modern machine learning is reducing the dimensionality of the feature space in training samples while preserving essential information for classification and forecasting tasks. This study proposes a methodologically grounded approach that integrates the Kozachenko-Leonenko entropy (KLE) method with mutual information to enhance feature selection, thereby improving model accuracy and reducing computational complexity. A comparative analysis on the real-world dataset confirms the effectiveness of the proposed method in selecting informative features and improving classification performance.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Kozachenko-Leonenko entropy (KLE) method</kwd>
        <kwd>mutual information</kwd>
        <kwd>machine learning</kwd>
        <kwd>feature selection</kwd>
        <kwd>dimensionality reduction</kwd>
        <kwd>training sample1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>The quality of machine learning models largely depends on the quality of training samples that are
formed at the data preparation stage and directly affect the accuracy, generalisability and stability of
the models. Effective data preparation involves data cleaning, transformation, selection of relevant
features, and elimination of outliers. However, these processes are complex and require automation.</p>
      <p>The concept of entropy is one of the most powerful mathematical tools that allows for such
operations to be performed objectively and formally. Entropy methods allow to estimate the degree of
uncertainty in training data, identify the most relevant features, and find optimal strategies for their
processing.</p>
      <p>
        The use of entropy for data processing is not a novel concept, but its relevance in modern machine
learning tasks is only growing. One of the most well-known areas of its application is the
discretisation of continuous features. For example, the Fayad-Irani method is based on entropy
minimisation to determine the optimal partitioning thresholds, which allows obtaining compact and
informative value intervals [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. This technique is effective for improving trained models, which is
confirmed by empirical studies [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ]. Such approaches allow not only to reduce the dimensionality of
the feature space but also to improve the generalisation ability of the models.
      </p>
      <p>
        Another important aspect is the selection of features based on entropy criteria. Methods such as
information gain [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ] and Gini impurity [
        <xref ref-type="bibr" rid="ref6 ref7 ref8">6-8</xref>
        ] identify the extent to which each feature contributes to
class recognition. This enables the elimination of redundant or insignificant features, thereby
increasing the efficiency of classification algorithms. In particular, algorithms such as SelectKBest [
        <xref ref-type="bibr" rid="ref10 ref9">9,
10</xref>
        ] and Recursive Feature Elimination (RFE) [
        <xref ref-type="bibr" rid="ref11 ref12">11, 12</xref>
        ] are effectively used to select relevant features
even in cases of large and unbalanced samples. Taking into account methods based on conditional and
mutual information allows creating more flexible and adaptive feature selection strategies for further
use in machine learning algorithms.
      </p>
      <p>In addition to working with features, entropy is used to select the most informative training
samples. In the context of active learning, one of the most common approaches is Entropy Sampling,
where priority is given to samples for which the model has the highest uncertainty in predictions. This
allows to significantly reduce the size of the training set without degrading the classification quality.</p>
      <p>
        Studies in computer vision and text analytics confirm the effectiveness of this strategy [
        <xref ref-type="bibr" rid="ref13">13, 14</xref>
        ]. It
should also be noted that entropy analysis helps to assess the balance of classes in the training set. A
low value of the entropy of the class distribution signals a significant imbalance, which can negatively
affect the performance and accuracy of the model. Diagnostic criteria based on entropy help to
identify such problems in time and apply appropriate corrective strategies, such as sample
rebalancing or weighting.
      </p>
      <p>Therefore, the use of entropy methods in the processing and analysis of training samples opens up
wide opportunities to improve the efficiency of machine learning models. These approaches not only
optimise the structure of the sample but also improve its information content, which directly affects
the accuracy and stability of the built models.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Problem statement</title>
      <p>One of the key challenges of modern machine learning is to reduce the dimensionality of the feature
space without significant loss of information required to solve forecasting or classification tasks. The
growing amount of data used in models leads to an increase in computational complexity, model
overtraining, and a decrease in their generalisation ability. As a result, it is important to develop
effective methods for selecting informative features and reducing the dimensionality of the space
while retaining relevant information. Traditional approaches, such as principal component analysis
(PCA) [15] or linear discriminant analysis (LDA) [16], are effective only under certain assumptions
about the data distribution. However, in the case of complex, non-linear relationships between
features, these methods may not be effective. An alternative is entropy-based methods for assessing
the information content of features, which do not require any prior assumptions about the data
distribution.</p>
      <p>Therefore, the problem statement is to develop a methodologically sound approach based on
entropy methods, which will improve the quality of training samples and, consequently, the accuracy
of machine learning models.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Goal and objectives of the study</title>
      <p>The purpose of this study is to analyse and justify the effectiveness of using entropy method for
processing and analysing training samples in order to improve the quality of machine learning
models. To achieve this goal, the following objectives are considered.</p>
      <p>1. Analyse the capabilities of the Kozachenko-Leonenko entropy (KLE) method for assessing the
informativeness of features in N-dimensional space.
2. Development of a combined approach using KLE and mutual information for feature selection
to improve the quality of models and reduce the need for resources to solve classification or
prediction problems.</p>
      <p>The accomplishment of these objectives will contribute to the development of a methodological
framework for machine learning and provide an effective approach to the preparation of training
samples, which in turn will increase the accuracy and generalisation of models.</p>
    </sec>
    <sec id="sec-4">
      <title>4. A combined approach using KLE and mutual information for feature selection</title>
      <p>This section presents a novel approach to feature selection that combines KLE entropy and mutual
information. By integrating these two methods, the proposed approach aims to improve the
evaluation of feature relationships, decrease dimensionality of training sample and, as a result,
enhance both classification accuracy and generalization performance. The first part of the section
discusses the use of the KLE method in N-dimensional space for preparing the training sample, while
the second part demonstrates how KLE and mutual information are combined to perform feature
selection.</p>
      <sec id="sec-4-1">
        <title>4.1. KLE method in N-dimensional space</title>
        <p>The KLE method is an effective approach for estimating differential entropy in N-dimensional
space. This nonparametric method, unlike its parametric counterparts, does not require any prior
assumptions about the data distribution and works well even with complex, nonlinear distributions
[17].</p>
        <p>Let's assume that the task of training sample preparation involves the following.
1. Remove or reduce the influence of noisy data (outliers);
2. Select or transform a subset of features in such a way as to ensure the highest informativeness
with respect to the output variable;
3. Ensure satisfactory accuracy of the machine learning model, even under conditions of
incomplete information.</p>
        <p>Denote the given dataset by X ={( xi , yi)}, i=1 , N , where xi∈ R N is a feature vector and yi is the
target variable ( yi∈ R for regression, yi∈ {C1 , C2 , … , C k } for classification). The KLE method
estimates the differential entropy of the feature space X, which is useful for analysing the
informativeness of features and their relationship with the target variable Y.</p>
        <p>The differential entropy of a random variable X ∈ RN is defined as</p>
        <p>H ( X )=−∫ f X ( x ) log f X ( x ) dx ,</p>
        <p>ℝN
where f X ( x ) is the probability density of the feature distribution.</p>
        <p>The KLE algorithm for N-dimensional space is as follows. First, for each point xi the distance to its
k-th nearest neighbour is found (for example, with the Euclidean distance):
ρk ( xi)=min {ρ∨|{x j∈ X :‖x j− xi‖≤ ρ }|≥ k +1}, (2)</p>
        <p>k
where ρk ( xi)is the radius containing k+1 points, including the point xi itself; x j is the point for
which the k-th nearest neighbour is searched; ‖x j− xi‖ is the distance between points xi and x j
according to the selected distance metric; ρ is the value of the radius, which changes until the
minimum value is found that satisfies the condition;k is the number of nearest neighbours that are
taken into account.</p>
        <p>The number of k nearest neighbours, which is taken into account when estimating the distribution
of the noise component, can be calculated with the pseudocode from the Figure 1.
(1)
KLE entropy with fixedk on the sample X b; B is the number of samples (e.g., bootstrap samples or
cross-validation samples); X b is one of the B samples.</p>
        <p>where Γ (⋅ ) is the gamma function.</p>
        <p>The last step is to calculate the KLE-entropy estimate</p>
        <p>H KLE=Ψ ( M )−Ψ ( k )+ log (V N )+</p>
        <p>N M</p>
        <p>∑ ln ρk ( xi)+ γ ,</p>
        <p>M i=1</p>
        <p>N
I ( X b , Y )=∑ I ( xij , yi) ,
j=1</p>
        <p>n
provided that
p ( xi1 , xi 2 , … , x¿∨ yi)=∏ p ( xij∨ yi) .</p>
        <p>j=1
Accordingly, Mutual Information for xi is defined as:</p>
        <p>N
I ( xi∨ yi)=∑ ∑ p ( xi , yi) log
i=1 yi∈ Y</p>
        <p>p ( xi , yi)
p ( xi) p ( yi)
1
where Ψ (⋅ ) is the digamma function, Ψ ( x +1)=Ψ ( x )+ ; M is the number of sample points;
x
γ≈0.5772 is the Euler-Mascheroni constant.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Combining KLE with mutual information</title>
        <p>The mutual information is a powerful tool for detecting non-linear dependencies between features
and the target variable, facilitating the construction of a dataset that is both balanced and
informationrich [18]. Mutual Information measures the extent to which information about
X b={( xi , yi)}, i=1 , N helps to determine Y. If the values of X b are conditionally independent, then
mutual information will be</p>
        <p>Next, the volume of a unit sphere in N-dimensional space is determined by the chosen norm. For
example, for the Euclidean norm, the volume of a unit sphere</p>
        <p>N
π 2
V N =</p>
        <p>N
Γ ( +1)
2
,
where p ( xi , yi) is the joint probability of events xi and yi; p ( xi), p ( yi) are the corresponding
marginal probabilities of xi and yi.</p>
        <p>Based on the integral Mutual Information scores given on the sample from the dataset X b, it is
possible to eliminate insignificant features, thus the sample for model training will be of the form</p>
        <p>X T ={X b∨ I ( xi∨ yi) ≥ α },
ϕ ( X T )=ScoreCV ( X T )− λ|X T|,
where α – is the cut-off threshold.</p>
        <p>This approach makes it possible to reduce the dimensionality of the feature space while
maintaining a high level of relevance to the target variable.</p>
        <p>If the model performance at the validation stage is insufficient, it is possible to add an algorithm for
iteratively adjusting the α threshold or implementing the combined criterion ϕ ( X T ), which aims to
maximise the ratio by changing the α parameter and the structure of X T :</p>
        <p>Where ScoreCV ( X T ) is the average model quality score based on the cross-validation; |X T| is the
number of selected features; λ ≥ 0 is the penalty factor.</p>
        <p>To sum up, the integration of KLE entropy and Mutual Information methods allows to significantly
reduce the dimensionality of the initial feature set, while maintaining sufficient information potential
for efficient model training. This increases not only the performance of machine learning algorithms,
but also their stability and interpretability in real classification and regression tasks.
(4)
(5)
(6)
(7)
(8)
(9)
(10)</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Comparative analysis of modelling results</title>
      <p>One of the important stages of developing effective machine learning models is validation of the
results on real data set, which allows to objectively assess the impact of preprocessing on the accuracy
and stability of classification. To verify the quality of the proposed entropy methods, a comparative
analysis of classification results using pre-processed and unprocessed training samples was applied.</p>
      <sec id="sec-5-1">
        <title>5.1. Dataset</title>
        <p>To evaluate the effectiveness of entropy-based methods for processing training samples to improve
the quality of machine learning models, the Gas Sensor Array Low-Concentration dataset [19] is used.
Table 1 shows a snapshot of the Gas Sensor Array Low-Concentration dataset. The full dataset
contains 90 gas samples collected by 10 semiconductor sensors. The studied gases include ethanol,
acetone, toluene, ethyl acetate, isopropanol and n-hexane at three concentrations: 50 ppb, 100 ppb and
200 ppb. For each gas and concentration combination, five samples were collected to provide a variety
of data for modelling. Each sample consists of 9000 data points representing the sensors' response to
the gas. Each sensor generates 900 data points, allowing for detailed analysis of their response to
different gases and concentrations. The data was collected in three stages: baseline (5 minutes), gas
injection (10 minutes), and purification (15 minutes) with a sampling rate of 1 Hz.</p>
        <p>The presence of data for several types of gases, concentrations, and time phases (see Table 1)
makes it possible to form a representative training set for building classification models in real-world
conditions. Such a sample is optimal for testing the effectiveness of feature space reduction methods,
in particular those based on entropy and mutual information.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Description of the experiments</title>
        <p>The base machine learning model is the ensemble method Random Forest, where multiple
independent decision trees are combined to enhance accuracy and stability. The implementation of
machine learning algorithms and data analysis is conducted in Python, utilising libraries such as
NumPy, pandas, matplotlib, scikit-learn, time, and psutil. These libraries support tasks including
classification, dataset splitting (train_test_split), learning curve analysis, model training with
RandomForestClassifier, performance evaluation metrics, t-SNE, PCA, and resource and execution
time monitoring.</p>
        <p>The dataset is initially divided into training and test sets, followed by model training based on a
predefined target vector. To assess sensitivity to missing features, a mechanism is employed that
retains only a fixed number of significant features, replacing the remaining ones with mean values
computed from the training set. Classification quality is evaluated using accuracy metrics, ROC AUC,
MAE, and MSE, while the model’s performance dependency on training set size is examined through a
learning curve analysis.</p>
        <p>For feature space analysis and dimensionality reduction, t-SNE (a non-linear projection) and PCA
(a linear projection onto principal components) are applied. In addition to classification performance,
computational efficiency and resource consumption are assessed by measuring execution time and
CPU load.</p>
        <p>All computational experiments presented in the paper were conducted on a laptop equipped with
an Intel Core i7-13620H processor (13th generation, 10 cores: 6 performance and 4 efficiency cores,
base frequency 2.40 GHz) and 16 GB of RAM. The system operates on a 64-bit Windows operating
system with x64 architecture. Parallel computations were automatically handled through CPU
multithreading using libraries such as scikit-learn, NumPy, and joblib, which support task
parallelization via the n_jobs parameter. GPU acceleration was not employed, as the main
computations involved tabular data processing and ensemble modeling (Random Forest), which are
efficiently executed on modern CPUs.</p>
      </sec>
      <sec id="sec-5-3">
        <title>5.3. Results of the experiments</title>
        <p>The results of model testing presented in this section demonstrate a comparative analysis of the
effectiveness of classification methods under conditions of incomplete input data and different
approaches to feature preprocessing. Particular attention is paid to the quality of classification,
stability of models, their ability to generalise, and computational efficiency.</p>
        <p>Figure 2 a) (left) shows the effect of the available features on the classification accuracy in the
absence of training set preprocessing (¬KLE). There is a gradual increase in classification accuracy
with the number of available features, but this increase is non-linear and has some fluctuations. The
initial accuracy values are low, and the maximum value does not reach one, which indicates a
significant loss of information. These results indicate that even with an increase in available features,
the classifier cannot achieve ‘perfect’ accuracy due to the influence of noisy or unrepresentative data.
In Figure 2 b) (right), where the training set was pre-processed using the Kozachenko-Leonenko
entropy (KLE) method, a much faster increase in classification accuracy is observed. With a small
number of features, the accuracy values are almost the same as in the first graph, but after reaching a
certain threshold (approximately at 7 features), the accuracy increases sharply and approaches one.
This indicates a significant improvement in classification quality due to data preprocessing, which
likely eliminated the influence of noisy or irrelevant features, making the model more robust to
incomplete data.</p>
        <p>Figure 3 a) (left) shows that for the ¬KLE model, the training accuracy remains relatively stable as
the training sample size increases, while the cross-validation accuracy gradually increases but
remains below the training accuracy. This may indicate a certain level of overfitting, as the model
demonstrates higher accuracy on training data than on cross-validation data. The difference between
the two curves indicates the presence of noise and uneven distribution of information in the training
sample. In Figure 3 b) (right), the KLE model shows a much better balance between training and
crossvalidation accuracy. Already with relatively small amounts of data, the model achieves high accuracy,
and the difference between the two curves is much smaller, indicating better model generalisation and
reduced overfitting. This confirms the effectiveness of pre-processing, which reduces the influence of
irrelevant or noisy features and improves the quality of training.</p>
        <p>In Figure 4 a) (left), there is significant chaos and high density of points for the t-SNE test of the
¬KLE model, indicating a weak structure in the data. The classes overlap significantly, which can
make classification difficult. Such a distribution indicates the presence of noise and irrelevant
information in the features, which can reduce the accuracy of the model and its ability to generalise
patterns in the data. In Figure 4 b) (right), a more structured distribution of points is observed for the
KLE model. Clusters are clearer, indicating improved differentiation between classes. This confirms
the effectiveness of pre-processing in reducing noise and identifying hidden patterns in the data.</p>
        <p>Figure 5 a) (left) for the ¬KLE model shows that the data are unevenly distributed and have some
clusters, but the structure remains blurred. The classes overlap to a large extent, which can make
classification difficult, as there is no clear boundary between the groups. Such a distribution
indicates that the original features contain a significant amount of noise or irrelevant information,
which reduces the quality of model training. In Figure 5 b) (right), the KLE model shows a clearer
separation between the groups, the data looks more clustered and has distinct directions in the
principal component space. This indicates effective noise removal and improved differentiation
between classes, which can improve classification accuracy.</p>
        <p>Figure 6 a) (top row) shows that without preprocessing, the prediction time and the CPU usage
increases from 0.45 sec (3 features) to 0.67 sec (10 features), indicating the high computational
complexity of the model. In Figure 6 b) (bottom row), after processing with the Kozachenko-Leonenko
entropy method, the prediction time increases only from 0.077 sec to 0.096 sec, while CPU Time
stabilises at 0.094 sec after 6 features. This confirms the effectiveness of the processing in reducing
computational costs and improving performance.</p>
        <p>Table 2 shows that without pre-processing (¬KLE), all metrics deteriorate sharply as the number of
features decreases: AUC-ROC drops from 0.999 (10/10 features) to 0.747 (3/10 features), MAE and MSE
increase significantly, and Log Loss increases from 0.525 to 5.427, indicating a loss of model stability.
This indicates that without preprocessing, the model becomes very sensitive to a decrease in the
number of features, which impairs its ability to generalise patterns. On the other hand, with KLE, the
classification accuracy remains consistently high even with incomplete information. For example, the
AUC-ROC changes less sharply (from 1.000 to 0.805), and the MSE and Log Loss remain at lower levels
than in the case of ¬KLE. This shows that entropy processing improves model generalisability and
reduces the impact of missing features, making the algorithm more robust to incomplete data.
* - Machine learning algorithm without using KLE for preparing the initial sample</p>
        <p>The results in Table 2 demonstrate that data preprocessing using KLE not only improves
classification accuracy, but also ensures the stability of the metrics while reducing the amount of input
information. This approach is effective both in terms of model quality and computational
performance.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>The primary goal of this study was to develop and substantiate an effective method for optimization of
training sample, based on entropy theory, combining the Kozachenko-Leonenko entropy (KLE) and
mutual information. The declared objectives included analysing the potential of KLE in N-dimensional
feature space and constructing a hybrid approach for feature selection to enhance model quality and
reduce computational cost. The findings fully reflect the achievement of these objectives.</p>
      <p>The proposed method offers a non-parametric evaluation of differential entropy, capable of
detecting noise and selecting informative features without relying on prior distributional
assumptions. The integration with mutual information enables identification of features most
relevant to the target variable, contributing to the creation of a compact yet expressive feature space.</p>
      <p>Empirical validation on the Gas Sensor Array Low-Concentration dataset confirmed the practical
effectiveness of the method with the following results.</p>
      <p>1. The AUC-ROC metric under preprocessing with KLE remained high even with partial data
(1.000 with the full feature set; 0.805 with only 3 out of 10 features), whereas in the unprocessed
baseline it dropped to 0.747.
2. The Mean Squared Error (MSE) remained low (ranging from 0.05 to 427.91 depending on the
number of features) for the proposed method, indicating improved noise resilience.
3. The Macro-F1 score remained consistently higher (ranging from 0.98 to 0.06 for KLE vs. 0.91
to 0.05 for the baseline) under feature removal scenarios.
4. Prediction time decreased from 0.67 seconds (baseline) to 0.096 seconds (with KLE) for 10
features, demonstrating enhanced computational efficiency.
5. Visualisation techniques such as t-SNE and PCA further confirmed improved class
separability and reduced noise.</p>
      <p>The analysis shows that the use of KLE entropy allows for an objective assessment of the
informativeness of features, reducing their number without losing relevance, which significantly
increases the accuracy and stability of models. The use of mutual information in combination with
KLE facilitates the selection of the most significant features, which minimises the influence of noise
factors and allows optimising the feature space for training. The results also show a significant
reduction in model overfitting and computational costs by removing redundant information.</p>
      <p>Therefore, this research presents a theoretically grounded and empirically validated approach to
entropy-based preprocessing. The alignment between the initially defined objectives and the achieved
results has been demonstrated through both qualitative and quantitative analysis. This work provides
a foundation for the further integration of entropy-driven techniques into advanced machine learning
pipelines, particularly in domains characterised by complex or imbalanced datasets.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Directions for further research</title>
      <p>The use of entropy criteria in combination with deep learning methods can significantly improve the
quality of training samples, especially in high-dimensional spaces. In particular, a promising area is
the adaptation of the KLE method to analyse the relationship between features in deep neural
networks, which will not only reduce the feature space but also determine their informativeness in the
context of multilevel data representations.</p>
      <p>Special attention should be paid to the integration of entropy-based approaches with active
learning methods, which will allow for dynamic sample adjustment in the process of model training.
The use of strategies similar to Entropy Sampling will allow optimising the balance of classes and
selecting the most informative examples for training. Further development of such approaches may
include the creation of adaptive algorithms that combine estimates of differential entropy and mutual
information to optimise the learning process in real time. This will not only reduce computational
costs, but also improve the generalisation capability of the models, ensuring their stability even in
circumstances of high variability in input data.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgements</title>
      <p>The work was supported by the state budget research project “Develop methods for modelling the
processes of targeted management of complex multi-component information systems for various
purposes” (state registration number 0123U100754) of the V.M. Glushkov Institute of Cybernetics of
the National Academy of Sciences (NAS) of Ukraine.</p>
    </sec>
    <sec id="sec-9">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used Grammarly in order to: Grammar and spelling
check. After using this tool, the authors reviewed and edited the content as needed and take full
responsibility for the publication’s content.
[14] W. Zhang, H. Guo, A. Le, J. Yang, J. Liu, Z. Li, T. Zheng, S. Xu, R. Zang, L. Zheng, and B. Zhang,
Lemur: Log Parsing with Entropy Sampling and Chain-of-Thought Merging, ArXiv, 2024,
doi:10.48550/arXiv.2402.18205.
[15] S. Ali, S. Verma, M. B. Agarwal, R. Islam, M. Mehrotra, R. K. Deolia, J. Kumar, S. Singh, A. A.</p>
      <p>Mohammadi, D. Raj, M. K. Gupta, P. Dang, and M. Fattahi, “Groundwater quality assessment
using water quality index and principal component analysis in the Achnera block, Agra district,
Uttar Pradesh, Northern India”, Scientific Reports 14 (2024), doi:10.1038/s41598-024-56056-8.
[16] S. Zhao, B. Zhang, J. Yang, J. Zhou, and Y. Xu, Linear discriminant analysis, Nature Reviews</p>
      <p>Methods Primers 4(1) (2024) 70, doi:10.1038/s43586-024-00346-y.
[17] W. Xu, S. Zhu, Q. Li, X. Chen, and X. Zhou, Star uniform selection algorithm based on maximizing
Kozachenko-Leonnko entropy, in Proceedings of the Fourth International Conference on
Artificial Intelligence and Electromechanical Automation (AIEA 2023), F. Wen, C. Zhao, and Y.
Chen, Eds., vol. 12709, International Society for Optics and Photonics (SPIE), 2023, p. 1270936,
doi:10.1117/12.2684884.
[18] H. Gong, Y. Li, J. Zhang, B. Zhang, and X. Wang, A new filter feature selection algorithm for
classification task by ensembling Pearson correlation coefficient and mutual information,
Engineering Applications of Artificial Intelligence 131 (2024) 107865, doi:
10.1016/j.engappai.2024.107865.
[19] F. Tian, L. Zhao, S. Deng, Gas sensor array low-concentration [Dataset], UCI Machine Learning
Repository, 2023. doi:10.24432/C5CK6F.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>A feature discretization method for classification of highresolution remote sensing images in coastal areas</article-title>
          ,
          <source>IEEE Transactions on Geoscience and Remote Sensing</source>
          <volume>59</volume>
          (
          <year>2021</year>
          )
          <fpage>8584</fpage>
          -
          <lpage>8598</lpage>
          . doi:
          <volume>10</volume>
          .1109/TGRS.
          <year>2020</year>
          .
          <volume>3016526</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ma</surname>
          </string-name>
          , J. Zhai,
          <article-title>Big data decision tree for continuous-valued attributes based on unbalanced cut points</article-title>
          ,
          <source>Journal of Big Data</source>
          <volume>10</volume>
          (
          <year>2023</year>
          )
          <article-title>135</article-title>
          . doi:
          <volume>10</volume>
          .1186/s40537-023-00816-2.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Suppa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Asci</surname>
          </string-name>
          , G. Saggio,
          <string-name>
            <given-names>P.</given-names>
            <surname>Di Leo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zarezadeh</surname>
          </string-name>
          , G. Ferrazzano, G. Costantini,
          <article-title>Voice analysis with machine learning: one step closer to an objective diagnosis of essential tremor</article-title>
          ,
          <source>Movement Disorders</source>
          <volume>36</volume>
          (
          <year>2021</year>
          )
          <fpage>1401</fpage>
          -
          <lpage>1410</lpage>
          . doi:
          <volume>10</volume>
          .1002/mds.28508.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>K.</given-names>
            <surname>Son</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. W.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Yoon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. H.</given-names>
            <surname>Hyun</surname>
          </string-name>
          ,
          <article-title>CreativeSearch: Proactive design exploration system with Bayesian information gain and information entropy, Automation in Construction (</article-title>
          <year>2022</year>
          ). doi:
          <volume>10</volume>
          .1016/j.autcon.
          <year>2022</year>
          .
          <volume>104502</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>G. S.</given-names>
            <surname>Reddy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chittineni</surname>
          </string-name>
          ,
          <source>Entropy based C4</source>
          .
          <article-title>5-SHO algorithm with information gain optimization in data mining</article-title>
          ,
          <source>PeerJ Computer Science</source>
          <volume>7</volume>
          (
          <year>2021</year>
          ). doi:
          <volume>10</volume>
          .7717/peerj-cs.
          <volume>424</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Deng</surname>
          </string-name>
          , H. Xu,
          <article-title>Group feature screening based on Gini impurity for ultrahighdimensional multi-classification</article-title>
          ,
          <source>AIMS Mathematics</source>
          (
          <year>2023</year>
          ). doi:
          <volume>10</volume>
          .3934/math.2023216.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>P. K.</given-names>
            <surname>Mali</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Motiyani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Sameed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mehta</surname>
          </string-name>
          ,
          <article-title>Hyper spectral image clustering and local feature selection using Gini impurity</article-title>
          ,
          <source>in: Proceedings of the 7th International Conference on Trends in Electronics and Informatics (ICOEI)</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>1629</fpage>
          -
          <lpage>1634</lpage>
          . doi:
          <volume>10</volume>
          .1109/ICOEI56765.
          <year>2023</year>
          .
          <volume>10125605</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>R. A.</given-names>
            <surname>Disha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Waheed</surname>
          </string-name>
          ,
          <article-title>Performance analysis of machine learning models for intrusion detection system using Gini Impurity-based Weighted Random Forest (GIWRF) feature selection technique</article-title>
          ,
          <source>Cybersecurity</source>
          <volume>5</volume>
          (
          <year>2022</year>
          ).
          <source>doi: 10.1186/s42400-021-00103-8.</source>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M.</given-names>
            <surname>Maftoun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. H.</given-names>
            <surname>Joloudari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Zare</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Khademi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Atashi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Nematollahi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Alizadehsani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Górriz</surname>
          </string-name>
          ,
          <article-title>Improving prediction of mortality in ICU via fusion of SelectKBest with SMOTE method and Extra Tree classifier</article-title>
          , in: J.
          <string-name>
            <surname>M. Ferrández Vicente</surname>
            ,
            <given-names>M. Val</given-names>
          </string-name>
          <string-name>
            <surname>Calvo</surname>
          </string-name>
          , H. Adeli (Eds.),
          <source>Artificial Intelligence for Neuroscience and Emotional Systems</source>
          , vol.
          <volume>14674</volume>
          , Lecture Notes in Computer Science, Springer, Cham,
          <year>2024</year>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>031</fpage>
          -61140-
          <issue>7</issue>
          _
          <fpage>7</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M.</given-names>
            <surname>Jamei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ali</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Afzaal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Karbasi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Malik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Farooque</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Haydar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q. U.</given-names>
            <surname>Zaman</surname>
          </string-name>
          ,
          <article-title>Accurate monitoring of micronutrients in tilled potato soils of eastern Canada: Application of an explainable inspired-adaptive boosting framework coupled with SelectKBest, Comput</article-title>
          . Electron. Agric.
          <volume>216</volume>
          (
          <year>2024</year>
          )
          <article-title>108479</article-title>
          . doi:
          <volume>10</volume>
          .1016/j.compag.
          <year>2023</year>
          .
          <volume>108479</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>S.</given-names>
            <surname>Kollem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Sirigiri</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Peddakrishna</surname>
          </string-name>
          ,
          <article-title>A novel hybrid deep CNN model for breast cancer classification using Lipschitz‐based image augmentation and recursive feature elimination</article-title>
          ,
          <source>Biomed. Signal Process. Control</source>
          .
          <volume>95</volume>
          (
          <year>2024</year>
          ) 106406, doi: 10.1016/j.bspc.
          <year>2024</year>
          .
          <volume>106406</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>M.</given-names>
            <surname>Awad</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Fraihat</surname>
          </string-name>
          ,
          <article-title>Recursive Feature Elimination with Cross-Validation with Decision Tree: Feature Selection Method for Machine Learning-</article-title>
          <source>Based Intrusion Detection Systems, J. Sens. Actuator Netw</source>
          .
          <volume>12</volume>
          (
          <issue>5</issue>
          ) (
          <year>2023</year>
          ) 67, doi:10.3390/jsan12050067.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13] S.-l. Li,
          <string-name>
            <given-names>G.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Yao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ding</surname>
          </string-name>
          , and
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Entropy-driven Sampling and Training Scheme for Conditional Diffusion Generation</article-title>
          ,
          <source>in European Conference on Computer Vision</source>
          ,
          <year>2022</year>
          , doi:10.48550/arXiv.2206.11474.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>