<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Machine learning-based information technology for analyzing energy peaks in power grid balancing⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Dmytro Tymoshchuk</string-name>
          <email>dmytro.tymoshchuk@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrii Voloshchuk</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andriy Sverstiuk</string-name>
          <email>andriy.voloschuk30@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Halyna Osukhivska</string-name>
          <email>osukhivska@tntu.edu.ua</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Oksana Bahrii-Zaiats</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>I. Horbachevsky Ternopil National Medical University</institution>
          ,
          <addr-line>Maidan Voli St., 1, Ternopil, 46002</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Ternopil Ivan Puluj National Technical University</institution>
          ,
          <addr-line>Ruska str. 56, Ternopil, 46001</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2026</year>
      </pub-date>
      <abstract>
        <p>The article presents an information technology based on machine learning methods for detecting energy peaks and automatically balancing the power grid by activating storage installations or renewable energy sources. The study is based on hourly electricity consumption data for a month, described by nine statistical descriptors of amplitude variability and the LEC indicator with two classes of balance. A comparative analysis of five machine learning models (SVM, kNN, Random Forest, MLP, XGBoost) with the selection of hyperparameters by the Grid Search method and 5-fold cross-validation was conducted, where the target metric was the F1-score. The best results were obtained for the XGBoost model (Accuracy ≈ 0.961), which indicates its high ability to recognize balanced (class1) and unbalanced (class2) power consumption modes. Permutation Feature Importance analysis confirmed that variability descriptors (Range, Std_Dev, Max) are crucial for classifying energy anomalies. The approach provides timely detection of unstable regimes and reduces false alarms, increasing the stability and reliability of the power system.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;energy peaks</kwd>
        <kwd>machine learning</kwd>
        <kwd>power grid balancing</kwd>
        <kwd>information technology 1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Modern energy systems are becoming increasingly complex, as they combine various sources of
generation, control systems and consumers with dynamic operating modes. Such increasing
structural complexity makes them more vulnerable to external influences and fluctuations in
electricity consumption. The level of load on the electricity grid is determined by a set of factors
that directly affect the behavior of consumers and may depend on standard daily, weekly and
seasonal cycles, changes in weather conditions and other factors. Periods of extreme temperatures
lead to a significant increase in electricity consumption due to increased use of heating or air
conditioning systems. In addition, consumption is influenced by socio-economic factors, holiday
periods, mass events and changes in industrial production, emergencies, etc., which create peak
loads and affect the stability of the operation of the energy system [
        <xref ref-type="bibr" rid="ref1 ref2">1,2</xref>
        ].
      </p>
      <p>Ensuring stable operation of the power grid in conditions of such fluctuations requires the
implementation of modern approaches to energy balancing and load forecasting. The use of
renewable energy sources (solar, wind generation) and energy storage installations allows
smoothing peak loads, compensating for short-term power shortages and increasing the flexibility
of the system. In this context, the implementation of intelligent energy metering systems becomes
an important step towards effective monitoring of energy consumption and distribution at the level</p>
      <p>
        0000-0003-0246-2236 (D. Tymoshchuk); 0009-0007-1478-1601 (A. Voloshchuk); 0000-0001-8644-0776 (A. Sverstiuk);
0000-0003-0132-1378 (H. Osukhivska); 0000-0002-5533-3561 (O. Bahrii-Zaiats)
of cities and regions [
        <xref ref-type="bibr" rid="ref3 ref4 ref5">3-5</xref>
        ]. For their effective management, the accuracy of input data and the
adequacy of models are critically important, which is a fundamental aspect in complex technical
systems [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        To solve these complex problems, it is worth using modern technologies, in particular artificial
intelligence (AI). Machine learning (ML) methods have become widespread in various fields - from
medicine [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], finance [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and materials science [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] to transport [10] and cybersecurity [11]. They
allow to automate data analysis processes, identify hidden patterns, increase the accuracy of
forecasts and make informed decisions based on large amounts of information. In the energy
sector, these technologies play an important role in increasing the efficiency of energy solutions.
They are used to forecast consumption, detect anomalies, optimize equipment operating modes and
increase network stability. The integration of ML algorithms into peak load analysis processes
opens up new opportunities for more accurate prediction of system behavior and timely detection
of instability risks.
      </p>
      <p>Unlike traditional statistical methods, modern machine learning models are able to recognize
hidden patterns in data, which allows developing more effective strategies for managing energy
systems, ensuring stable operation of the power grid, and reducing energy supply costs. In the
authors’ previous studies [12], a computer system for energy distribution under electricity shortage
conditions was developed using AI.</p>
      <p>Modern approaches to load forecasting, including meta-learning frameworks for selecting
optimal models [13], contribute to improving the reliability of power systems. Comprehensive
reviews of the application of deep learning for intelligent demand management and load balancing
in smart grids [14] confirm the growing role of ML in improving the reliability of power systems.</p>
      <p>To assess the relevance of research on the use of ML methods in increasing the efficiency of
energy solutions, an analytical query TITLE-ABS-KEY(("energy peak" OR "power system stability"
OR "load forecasting" OR "power grid balancing" OR "energy management system") AND
("machine learning" OR "Random Forest" OR "XGBoost" OR "MLP" OR "neural network" OR
"LSTM" OR "Transformer" OR "GNN")) was formulated in the Scopus scientometric database.
According to the results of the search query on this topic, 14,372 scientific papers were found in the
Scopus scientometric database, of which 9,137 were found in the last 10 years from 2015 to 2024
(Figure 1).</p>
      <p>The largest number of literary sources on the topic under study has been observed in the last 3
years. In particular, in 2022 - 1328, 2023 - 1675, 2024 - 2187, which confirms the relevance of
researching this problem and the constant growth of interest in it worldwide.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>The current state of research in power system stabilization is characterized by the rapid
development of AI and ML methods for load forecasting, anomaly detection, and control
optimization. The presented review covers key scientific achievements that demonstrate a variety
of approaches: from hybrid statistical models [15] to innovative deep learning architectures such as
graph neural networks and transformers.</p>
      <p>A hybrid approach to long-term forecasting with hourly resolution is proposed, namely,
combining classical statistical regression models to describe the underlying data structure (taking
into account temperature and calendar factors) with a Long Short-Term Memory (LSTM) network
for modeling and correction of residual error [16]. Developing the idea of time series analysis, a
hybrid architecture combining convolutional neural networks (CNN) and LSTM for predicting
electricity consumption in residential buildings is presented [17]. This approach allows for the
effective extraction of both local patterns and temporal dependencies in consumption data. An
approach based on XGBoost and factorization machine is proposed to assess the transient stability
of power systems [18]. This method allows efficient processing of high-dimensional system state
data and provides fast and accurate classification of power grid stability in real time. The use of
graph neural networks for modeling the topological properties of power grids [1 9] allows to take
into account spatial relationships between different nodes of the system and to detect hidden
patterns in load distribution, which is especially important for the analysis of cascading failures
and network development planning. An approach to detecting anomalies in distributed power grids
based on autoencoders and federated learning is proposed, which provides decentralized learning
without transferring private data [20]. In this way, LSTM recurrent networks that model temporal
dynamics and CNN for extracting local patterns in the data are combined. In addition, a data fusion
technique is used, which provides the ability to combine consumption information with
meteorological and other external factors. In this case, higher forecast accuracy is provided
compared to classical models such as ARIMA or Random Forest. Recent research demonstrates a
wide range of ML approaches for power system analysis. Deep learning, in particular
Transformerbased architectures, combined with generative adversarial networks, show high efficiency for
detecting anomalies in load time series [21]. Comparative analysis of the performance of Random
Forest and XGBoost under different class imbalance conditions shows that gradient boosting
generally outperforms traditional ensemble methods in power system classification problems [22].
A review of AI methods for assessing the dynamic stability of power systems, including deep
learning-based approaches that allow classifying different types of disturbances and predicting the
behavior of the system in critical modes, is presented in [23]. To further improve the accuracy of
anomaly detection, the Transformer-GAN model was developed [24]. The architecture combines
the Transformer module, which uses a self-attention mechanism to capture long-term
dependencies, and a generative adversarial network (GAN), where the generator learns normal
data patterns. A systematic review of the application of deep learning for intelligent demand
response [25] demonstrates the effectiveness of DL methods for real-time load forecasting and
demand management, which is critical for smart grid balancing and renewable energy integration.
The application of ML for real-time load management demonstrates the potential of intelligent
systems to improve the efficiency of smart grids [26].</p>
      <p>These studies demonstrate the significant potential of using AI to improve the reliability of
power systems through peak load balancing, in particular with the aim of using alternative sources
of electricity or energy storage facilities.</p>
      <p>The aim of this work is to develop information technology based on ML methods to detect
energy peaks and the need to automatically connect additional renewable energy sources or energy
storage facilities to prevent failure of energy nodes and increase the stability and balance of power
grids.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <p>To ensure effective real-time monitoring and automatic balancing of the power grid, a
comprehensive information technology is proposed that integrates three functional levels into a
unified architecture (Figure 2).</p>
      <p>The operation of the system starts at the first level, which performs continuous data acquisition
on electricity consumption from a distributed network of metering devices. Data sources include
smart energy meters at residential and industrial facilities, voltage and current sensors at
substations, telemetry from renewable energy sources (RES), and monitoring systems of energy
storage facilities (ESF). Data are collected with hourly resolution and transmitted over secure VPN
channels using industrial communication protocols to the central node of the system, where they
are stored in specialized databases. The next stage at the second level is intelligent information
processing. Based on validation results and the evaluation of accuracy metrics, the system selects a
machine learning model that is integrated into the decision-making core. This model subsequently
analyzes energy consumption patterns in real time and classifies the current state as “balanced”
(absence of critical peaks) or “unbalanced” (presence of anomalies). This approach ensures that
power system control is carried out by the most effective algorithm, providing high accuracy in
threat detection and minimizing false alarms. The final component of the architecture is the third
level, the decision-making level, which implements the physical balancing of the grid. The model
classifies the current state of the power system as “balanced” or “unbalanced” (in the presence of
critical peaks). Depending on the classification result, an automatic control scenario for distributed
resources is executed. When a balanced state is detected, the system activates a mode of
accumulating surplus energy in storage units and performs preventive monitoring of potential load
peaks. In the case of detecting an unbalanced state with critical consumption peaks, the system
automatically initiates compensatory actions: increasing generation from renewable energy
sources and discharging energy storage units to rapidly cover load peaks. Such an architecture
ensures energy balance management with a minimal response time to critical changes in the power
grid.</p>
      <p>The study is based on electricity consumption data from a regional energy company, which are
presented in the form of hourly measurements. Figure 3 shows a portion of the data obtained for
the period from April 20 to 26, 2025, with the peak amplitude values indicated (red). The dataset
formed from the maximum (peak) values of electricity consumption is used for the analysis.</p>
      <p>Ten statistical descriptors of the amplitude variability of electricity consumption were used as
input parameters for the ML model: Mean (arithmetic mean), Median (median), Min / Max
(minimum / maximum), Range (span), Std_Dev (standard deviation), SE (standard error), Sk
(asymmetry), Kurt (kurtosis), LEC (Level of Electric Consumption). Mean is a measure of central
tendency, reflecting the average level of load or frequency deviation. Median is a robust
characteristic of central tendency, resistant to the presence of outliers. Min / Max define the
boundaries of the operating range during operation. Range is the difference between the maximum
and minimum values, a measure of the total magnitude of fluctuations. Std_Dev quantifies the
volatility or dispersion of load/frequency. SE reflects the stability of the mean value within the
window. Sk is a characteristic of the skewness of the distribution, which can indicate sudden
increases or decreases in indicators. Kurt is a measure of the peakedness of the distribution, which
identifies the presence of extreme outliers (sharp spikes or dips).</p>
      <p>The initial parameter in the study is LEC — an indicator that reflects the level of balance of the
electrical network. This parameter characterizes the current state of the system, which is divided
into two classes: balanced (class 1) and unbalanced (class 2). The first class describes the operation
mode of the electrical network with low variability of consumption, that is, when electricity
consumption is stable and low, which corresponds to the predicted indicators, and requires
redirecting excess electricity to an energy storage facility to ensure stable operation of the
electrical network. The second class corresponds to a mode with high variability, characterized by
significant fluctuations between the minimum and maximum consumption values within an hour,
uneven load or signs of instability, which may indicate that the permissible parameters of the
power system are exceeded. When classifying the state as "need for balancing" (class 2), there is a
need to connect additional renewable energy sources or energy resources of ESF to balance the
load of the power grid as smoothly as possible, ensuring stable operation of the power grid.</p>
      <p>The distribution of electricity consumption data by class is shown in Figure 4.</p>
      <p>The generated dataset contained 2400 samples, evenly distributed between two classes: 1200
samples of class 1 and 1200 samples of class 2. The classes were formed taking into account the
threshold value of electricity consumption, which for this implementation was 0.0749 MW. The
data was structured in such a way that if the value of electricity consumption exceeded the
threshold calculated as the sum of the average value of the studied period, then they were
considered "peak". To build and evaluate the effectiveness of ML models, the generated dataset was
divided into training and test samples in a ratio of 70/30 while preserving the proportions of the
target variable. The distribution was performed according to the principle of stratified splitting
with a fixed parameter random_state = 32, which guarantees the reproducibility of the results and
uniform representation of each class in both subsamples.</p>
      <p>The work uses five ML algorithms: Support Vector Machine (SVM) [27], k-Nearest Neighbors
(kNN) [28], Random Forest (RF) [29], Multilayer Perceptron (MLP) neural network [30], and
Extreme Gradient Boosting (XGBoost) [31]. Random Forest provides high reliability when
analyzing interrelated parameters by using an ensemble of independent decision trees. Each tree is
trained on a random subset of features and data, which reduces the impact of multicollinearity and
random noise. This approach increases the stability and generalization ability of the model.
XGBoost is optimized for fast prediction and is able to work effectively in conditions where the
system needs to respond quickly to changing modes. The algorithm is based on the sequential
construction of decision trees, which gradually reduce the error of previous models. Due to its high
performance, parallel computing, and efficient memory usage, XGBoost is often used for tasks that
require fast real-time decision-making. Due to its architecture and nonlinear activation functions,
MLP can reproduce complex dependencies that are not detected by traditional methods. kNN is
considered a baseline method based on the principle of similarity between samples. Each new
object is classified depending on the classes of its nearest neighbors in the feature space. This
approach makes it possible to assess how clearly the classes are separated in the given feature set
and how well the constructed descriptors reflect the characteristics of the system’s energy states.
SVM is a classic method for constructing an optimal separating hyperplane that maximizes the
distance between classes in the feature space. Comparative analysis of different ML algorithms is
standard practice for selecting the optimal model, especially important when solving problems for
critical infrastructure, such as the power system, where the reliability of classification is of
paramount importance.</p>
      <p>To solve the problem of classifying the balance states of the power grid, a software solution was
developed in Python, which uses the scikit-learn and XGBoost ML libraries. StandardScaler was
also used to normalize the input data before training the kNN, SVM, and MLP models, which
ensured a single scale of features and increased the stability of the training process. For the
ensemble models Random Forest and XGBoost, data normalization was not performed, since these
algorithms are insensitive to the scales of the input parameters. To understand the decision-making
mechanisms of the model and determine the most informative statistical descriptors, the
Permutation Feature Importance (PFI) global analysis method was used [32]. This approach allows
us to quantitatively assess the contribution of each feature to the formation of the forecast by
measuring the change in the model accuracy after a random violation of the connection between a
specific descriptor and the target variable. The main idea is that if a certain feature has a significant
impact on the result, then its random mixing will lead to a noticeable decrease in classification
efficiency. The PFI method belongs to the global explainability methods and does not depend on
the type of model. Its advantages are simplicity of implementation, intuitive interpretation of
results and the ability to compare the influence of different features. The main limitation is the
increased computational complexity associated with the need for multiple predictions. Despite this,
PFI remains one of the most effective methods for assessing the informativeness of descriptors in
explainable ML tasks.</p>
      <p>The performance of the models was assessed through the analysis of the confusion matrix,
which systematizes the results of predictions into four categories. True positive results (TP) record
cases of correct detection of electricity consumption peaks, while true negative (TN) reflect the
correct identification of normal modes. First-order errors (FP) characterize false signals about
peaks, and second-order errors (FN) - missed critical states of the power system. From these basic
indicators, key performance metrics are formed: Accuracy, Recall, Specificity, Precision, F1-Score
and geometric mean G-Mean [33].</p>
      <p>Classification models play a critical role in the tasks of monitoring electrical networks of
efficiency metrics, since their balance depends not only on the accuracy of diagnostics, but also on
the timeliness of the response of the control system. In the case of recognizing peak power
consumption states (class 2) and stable modes (class 1), these indicators are directly related to the
reliability of automatic connection of RES or ESF, as well as to the prevention of overloads and
failure of critical network elements. Accuracy reflects the proportion of correctly classified states
among all forecasts. In the context of energy systems, a high Accuracy value indicates the model’s
ability to correctly recognize both normal and peak modes, which ensures the reliability of the
overall monitoring system. However, this metric by itself may not be informative enough in the
case of unbalanced data, when the number of normal states significantly exceeds the number of
peak states. Recall is a key metric for this task, as it characterizes the model’s ability to detect all
peak load cases. High Recall minimizes the number of missed critical situations (FN). From the
point of view of operational security of power grids, Recall is of priority importance, since even
one missed peak can cause cascading failures or blackouts. Specificity reflects the model’s ability to
correctly identify normal operating modes of the system. High Specificity prevents false activations
of balancing systems, which reduces the number of unnecessary RES or ESF switching cycles. This
is especially important for the economic efficiency of the power system, since each unnecessary
operation entails additional energy costs and accelerates equipment wear. Precision characterizes
the proportion of real peak states among all those that the model has identified as critical. High
Precision means that the system reacts only to real threats, and not to random fluctuations in
consumption. Thus, it reduces the number of false positive states (FP), optimizes the use of
balancing resources, and maintains the efficiency of energy flow management. F1-Score is a
harmonious average between Precision and Recall, which provides a generalized assessment of the
balance between detecting all peak states and minimizing false alarms. For automatic grid
balancing systems, a high F1-Score ensures that the algorithm is both sensitive to real threats and
stable with respect to noise in the data. This metric is especially important at the stage of selecting
the optimal model, when a compromise between security and efficiency of network operation must
be found. G-Mean (geometric mean of Recall and Specificity) is used to assess the balance of the
classification. A high G-Mean value indicates that the model recognizes both peak and normal
modes equally well, without favoring any class. For power system control tasks, this means stable
operation of the algorithm under different load conditions, including non-standard situations or
variable consumption profiles.</p>
      <p>In addition to the basic metrics, the integral indicators Area Under the ROC Curve (AUC) and
Precision–Recall (PR) curve were used to comprehensively assess the effectiveness of the
classification models. AUC reflects the ability of the model to distinguish between balanced and
unbalanced power consumption modes at different decision thresholds. A high AUC value (close to
1) indicates a high discriminative ability of the algorithm. In the context of power systems, this
means that the model is able to timely recognize the approach to critical network operating modes
and prevent accidents by early load balancing. Precision–Recall curve provides a more detailed
picture of the model’s behavior in conditions of class imbalance, when the number of peak states is
relatively small compared to normal ones. Analysis of the area under the PR curve allows us to
assess the trade-off between Precision and Recall. A high value of Average Precision (AP), which
numerically corresponds to the area under the PR curve, indicates the model’s ability not only to
effectively detect peaks, but also to minimize the number of false signals. This is particularly
important in the context of automatic connection of renewable energy sources or energy storage
installations, where false triggering can lead to unnecessary energy losses and reduced control
system efficiency.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Results and Discussion</title>
      <p>Five ML models were used to solve the problem of classifying power grid operating modes. The
models were tuned by hyperparameter optimization before the training stage to achieve maximum
classification accuracy. The search for optimal combinations of hyperparameters was carried out
using the Grid Search method in combination with 5-fold cross-validation, which provided a
reliable assessment of the generalization ability of the models on the training data set. F1-Score was
chosen as the target optimization metric, since it takes into account both false positive (FP) and
false negative (FN) results, providing a balanced ratio between Precision and Recall. The use of this
metric is more appropriate compared to Accuracy, since it avoids the bias of the model towards the
dominant class and better reflects the ability of the algorithm to correctly detect both stable and
peak consumption modes. The results of tuning the main hyperparameters for each of the fiveML
models are given in Appendix A, which presents the optimal parameter values obtained during the
grid search process.</p>
      <p>All five algorithms demonstrated the ability to classify energy regimes, but the models showed
different levels of effectiveness.</p>
      <p>Figure 5 shows the normalized confusion matrices (%) for the kNN and SVM models, which
reflect the quality of the classification of power consumption states into two classes.
(a)
(b)</p>
      <p>High agreement between actual and predicted labels is observed for the kNN model: the
proportion of correct classifications is 92.78% for class 1 and 93.61% for class 2, indicating a
balanced ability of the algorithm to identify both steady and peak consumption modes. In contrast,
the SVM model shows lower accuracy for class 1 (73.89%) and a slight decrease in efficiency in
recognizing peak states for class 2 (92.50%).</p>
      <p>Figure 6 shows the normalized confusion matrices (%) for the Random Forest and MLP models.</p>
      <p>The Random Forest model demonstrated high performance, providing 93.89% correct
predictions for class 1 and 93.33% for class 2, indicating the stable ability of the ensemble of
decision trees to recognize both normal and peak modes. The MLP model showed even better
results: 94.44% correct classifications for class 1 and 95.28% for class 2.</p>
      <p>Figure 7 presents the normalized confusion matrix (%) for the XGBoost model, which
demonstrates the highest classification accuracy among all the algorithms considered.</p>
      <p>The correct prediction rate is 95.00% for class 1 and 97.22% for class 2, indicating an
exceptionally high ability of the model to recognize both stable and peak power consumption
modes. Compared to previous models (kNN, SVM, Random Forest, MLP), XGBoost provides the
lowest number of false classifications. The results confirm the superiority of gradient boosting in
power consumption data analysis tasks, where high classification accuracy is important.</p>
      <p>The performance results of the studied models for classifying electricity consumption peaks are
summarized in Table 1</p>
      <p>Analyzing the data in Table 1, the highest overall performance was demonstrated by the
XGBoost model, achieving Accuracy = 0.9611, Recall = 0.9500–0.9722, and F1-Score ≈ 0.961. This
indicates the high ability of the model to distinguish between steady-state and peak power
consumption modes. The high and balanced Recall and Specificity values (0.9500–0.9722) confirm
that the model is equally effective in detecting both normal and critical load states. Such accuracy
is especially valuable for real-time monitoring systems, where missing or false detection of peaks
can lead to overloading of nodes and power system failures. Ensemble (XGBoost, Random Forest)
and neural approaches (MLP) outperform methods based on metric distances or hyperplane
separation (kNN, SVM), confirming their ability to more effectively account for nonlinear
multivariate relationships between electricity consumption parameters. This makes such models
suitable for tasks such as automatic detection of load imbalances, activation of renewable energy
sources or energy storage systems, and ensuring real-time grid stability.</p>
      <p>Figure 8 shows the main performance curves of the XGBoost model, illustrating its ability to
effectively classify power consumption modes and accurately distinguish between steady and peak
system states.</p>
      <p>(a)</p>
      <p>(b)
(c)</p>
      <p>The ROC curve for the XGBoost model with a 95% confidence interval(Figure 7a) characterizes
the model’s ability to distinguish between balanced (class 1) and unbalanced (class 2) power
consumption modes. The area under the ROC curve (AUC = 0.9617) indicates a high discriminative
ability of the model. An AUC value close to 1 indicates that XGBoost effectively distinguishes
between stable and critical power system operation modes. The narrow 95% confidence interval
confirms the robustness of the model to changes in the data and the low variability of the results
during repeated estimations. Thus, the ROC curve confirms that XGBoost not only achieves high
classification accuracy, but also provides reliable and balanced detection of power grid states in
both classes. The Precision–Recall (PR) curve (Figure 7b) reflects the interdependence between
Precision and Recall when classifying peak power consumption modes. The area under the curve
(Average Precision, AP = 0.942) indicates excellent classification quality — the model maintains
high values of both metrics over a wide range of thresholds. A narrow 95% confidence interval
indicates stability of results and low variability of predictions. Thus, the PR curve confirms that
XGBoost is able to effectively detect peak power consumption modes, ensuring a minimum number
of false positives and maintaining high reliability of the decisions made. The curve of the
dependence of the F1-score metric on the classification threshold for the XGBoost model (Figure
7c) demonstrates how the balance between Precision and Recall changes depending on the selected
decision threshold. The maximum value of F1-score = 0.9615 is achieved at the optimal threshold of
0.42, which provides the best ratio between correct detection of peak states and minimization of
false alarms. In the low range of thresholds, high Recall prevails at the expense of reduced
Precision, while too high thresholds cause the model to lose sensitivity to critical states. Therefore,
the chosen threshold of 0.42 is an optimal compromise between the two key performance
indicators, ensuring the most effective performance of XGBoost in detecting peak power
consumption modes.</p>
      <p>To assess the contribution of each feature to forecasting and better understand the
decisionmaking mechanisms of the XGBoost model, an analysis of the importance of features was
conducted using the Permutation Importance method (Figure 9).</p>
      <p>The analysis of the importance of the features showed that the key role in the classification is
played by descriptors that describe the variability and range of electricity consumption. The most
significant predictor for the classification of network balance was Range, which is logically
justified, since energy peaks are characterized by large fluctuations from minimum to maximum
load values. Of secondary importance are the standard deviation (Std_Dev) and the maximum value
(Max), which also reflect the variability of energy parameters. Kurt, Min and SE have a moderate
impact. The central characteristics (Med, Mean, Sk), which describe the position and symmetry of
the distribution of energy consumption, turned out to be less informative for the classification,
since they reflect only the average load level, while peak states are determined mainly by
amplitude and variation indicators that record rapid and significant changes in electricity
consumption. The results obtained are consistent with the physical nature of electrical peaks and
support the hypothesis that dynamic characteristics of power consumption (how quickly and how
much the load changes) are more informative for detecting anomalies than static characteristics
(for example, the absolute level of power consumption). These results can be used not only to
optimize classification algorithms, but also to configure monitoring sensors and data acquisition
systems, power consumption where it is possible to increase the frequency of measurements and
accuracy specifically for parameters that characterize load variability.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>The paper proposes an information technology for detecting energy peaks and supporting
automatic balancing of the power grid by connecting renewable energy sources or energy storage
facilities. The hourly electricity consumption indicators of a regional energy company, which
reflect daily load fluctuations, were used as the input data. Five ML models with hyperparameter
optimization (Grid Search, 5-fold CV) were compared using the F1-Score metric. The best results
were demonstrated by the XGBoost model (Accuracy = 0.9611, high F1-Score and G-Mean), which
confirms its ability to consistently recognize both peak and normal modes of electricity
consumption. The Permutation Feature Importance analysis showed that the key contribution to
the classification is made by amplitude-variation features (Range, Std_Dev, Max), which reflect the
intensity and dynamics of changes in electricity consumption. The proposed approach provides
reliable differentiation between stable and peak states, reduces the number of false positives, and
increases the timeliness of the control system response. The developed technology is suitable for
integration into automated energy management systems (EMS) as an intelligent module, which
increases the stability of the power system and reduces the risk of unplanned outages.</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used Grammarly in order to grammar and spell
check, and improve the text readability. After using the tool, the authors reviewed and edited the
content as needed to take full responsibility for the publication’s content.
[10] T. Yuan, W. Rocha Neto, C. E. Rothenberg, K. Obraczka, C. Barakat, T. Turletti, Machine
learning for next-generation intelligent transportation systems: A survey, Trans. Emerg.</p>
      <p>Telecommun. Technol. 33(4) (2021). doi:10.1002/ett.4427.
[11] Y. Klots, V. Titova, N. Petliak, D. Tymoshchuk, N. Zagorodna, Intelligent data monitoring
anomaly detection system based on statistical and machine learning approaches, CEUR
Workshop Proceedings 4042 (2025) 80–89.
[12] A. Voloshchuk, D. Velychko, H. Osukhivska, A. Palamar, Computer system for energy
distribution in conditions of electricity shortage using artificial intelligence, Proc. 2nd Int.
Workshop on Computer Information Technologies in Industry 4.0 (CITI 2024) 3742 (2024) 66–
75, Ternopil, Ukraine.
[13] Y. Li, S. Zhang, R. Hu, N. Lu, A meta-learning based distribution system load forecasting model
selection framework, Appl. Energy 294 (2021) 116991. doi:10.1016/j.apenergy.2021.116991.
[14] H. Zhou, S. Zhang, J. Peng, S. Zhang, J. Li, H. Xiong, W. Zhang, Informer: Beyond efficient
transformer for long sequence time-series forecasting, Proc. AAAI Conf. Artif. Intell. 35(12)
(2021) 11106–11115. doi:10.1609/aaai.v35i12.17325.
[15] A. Voloshchuk, H. Osukhivska, M. Khvostivskyi, A. Sverstiuk, Application of periodically
correlated stochastic processes for forecasting electricity consumption, Meas. Comput. Devices
Technol. Process. 3 (2025) 393–403. doi:10.31891/2219-9365-2025-83-48.
[16] W. Zhang, H. Quan, D. Srinivasan, Parallel and reliable probabilistic load forecasting via
quantile regression forest and quantile determination, Energy 160 (2018) 810–819.
doi:10.1016/j.energy.2018.07.019.
[17] T.-Y. Kim, S.-B. Cho, Predicting residential energy consumption using CNN-LSTM neural
networks, Energy 182 (2019) 72–81. doi:10.1016/j.energy.2019.05.230.
[18] N. Li, B. Li, L. Gao, Transient stability assessment of power system based on XGBoost and
factorization machine, IEEE Access 8 (2020) 28403–28414. doi:10.1109/access.2020.2969446.
[19] Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, P. S. Yu, A comprehensive survey on graph neural
networks, IEEE Trans. Neural Netw. Learn. Syst. 32(1) (2021) 4–24.
doi:10.1109/tnnls.2020.2978386.
[20] K. Kea, Y. Han, T.-K. Kim, Enhancing anomaly detection in distributed power systems using
autoencoder-based federated learning, PLOS ONE 18(8) (2023) e0290337.
doi:10.1371/journal.pone.0290337.
[21] M. Imani, A. Beikmohammadi, H. R. Arabnia, Comprehensive analysis of random forest and
XGBoost performance with SMOTE, ADASYN, and GNUS under varying imbalance levels,
Technologies 13(3) (2025) 88. doi:10.3390/technologies13030088.
[22] H. Huang, Z. Li, H. Beng Gooi, H. Qiu, X. Zhang, C. Lv, R. Liang, D. Gong, Distributionally
robust energy-transportation coordination in coal mine integrated energy systems, Appl.</p>
      <p>Energy 333 (2023) 120577. doi:10.1016/j.apenergy.2022.120577.
[23] P. Sarajcev, A. Kunac, G. Petrovic, M. Despalatovic, Artificial intelligence techniques for power
system transient stability assessment, Energies 15(2) (2022) 507. doi:10.3390/en15020507.
[24] J. Duan, Deep learning anomaly detection in AI-powered intelligent power distribution
systems, Front. Energy Res. 12 (2024). doi:10.3389/fenrg.2024.1364456.
[25] P. Boopathy, M. Liyanage, N. Deepa, M. Velavali, S. Reddy, P. K. R. Maddikunta, N. Khare, T. R.</p>
      <p>Gadekallu, W.-J. Hwang, Q.-V. Pham, Deep learning for intelligent demand response and
smart grids: A comprehensive survey, Comput. Sci. Rev. 51 (2024) 100617.
doi:10.1016/j.cosrev.2024.100617.
[26] M. Imani, A. Beikmohammadi, H. R. Arabnia, Comprehensive analysis of random forest and
XGBoost performance with SMOTE, ADASYN, and GNUS under varying imbalance levels,
Technologies 13(3) (2025) 88. doi:10.3390/technologies13030088.
[27] Support vector machines, scikit-learn documentation. URL:
https://scikit-learn.org/stable/modules/svm.html.
[28] E. Kavlakoglu, What is the k-nearest neighbors algorithm?, IBM (online). URL:
https://www.ibm.com/think/topics/knn.
[29] RandomForestClassifier, scikit-learn documentation. URL:
https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.ht
ml.
[30] S. Haykin, Neural networks and learning machines, Pearson Education, 2009.
[31] E. Kavlakoglu, E. Russi, What is XGBoost?, IBM (online). URL:
https://www.ibm.com/think/topics/xgboost.
[32] Permutation feature importance, scikit-learn documentation. URL:
https://scikit-learn.org/0.24/modules/permutation_importance.html.
[33] Classification performance metrics and indices, Online resource. URL:
https://adriancorrendo.github.io/metrica/articles/available_metrics_classification.html.
[34] O. Savenko, S. Lysenko, A. Kryshchuk, Y. Klots, Botnet detection technique for corporate area
network, in Proceedings of the the IEEE 7th International Conference on Intelligent Data
Acquisition and Advanced Computing Systems (IDAACS) IEEE, 2013, pp. 363-368.</p>
    </sec>
    <sec id="sec-7">
      <title>A. Hyperparameter Settings</title>
      <sec id="sec-7-1">
        <title>Histogram-based tree construction method that accelerates training and saves memory</title>
      </sec>
      <sec id="sec-7-2">
        <title>L2 regularization parameter used to reduce overfitting and stabilize learning</title>
      </sec>
      <sec id="sec-7-3">
        <title>Description/Purpose</title>
      </sec>
      <sec id="sec-7-4">
        <title>Number of nearest neighbors considered when classifying a new instance distance minkowski</title>
        <p>2</p>
      </sec>
      <sec id="sec-7-5">
        <title>Weighting scheme: each neighbor’s contribution is inversely proportional to its distance from the query point</title>
      </sec>
      <sec id="sec-7-6">
        <title>Distance metric used to measure similarity between samples Power parameter for the Minkowski metric; p = 2 corresponds to Euclidean distance.</title>
      </sec>
      <sec id="sec-7-7">
        <title>Number of decision trees in the ensemble</title>
      </sec>
      <sec id="sec-7-8">
        <title>Split criterion measuring node impurity based on the Gini index</title>
      </sec>
      <sec id="sec-7-9">
        <title>Maximum tree depth is unrestricted; each tree grows until all leaves are pure</title>
      </sec>
      <sec id="sec-7-10">
        <title>Feature selection</title>
        <p>sqrt(n_features)
method
for
splitting:</p>
      </sec>
      <sec id="sec-7-11">
        <title>Automatically adjusts class weights based on their frequency in each bootstrap subsample to handle imbalance</title>
      </sec>
      <sec id="sec-7-12">
        <title>Enables bootstrap sampling (random sampling with replacement) for tree construction</title>
      </sec>
      <sec id="sec-7-13">
        <title>Minimum number of samples required to split an internal node</title>
      </sec>
      <sec id="sec-7-14">
        <title>Minimum number of samples required to be at a leaf node</title>
      </sec>
      <sec id="sec-7-15">
        <title>Description/Purpose</title>
      </sec>
      <sec id="sec-7-16">
        <title>Network architecture: three hidden layers with 16, 8, and 16 neurons, respectively</title>
      </sec>
      <sec id="sec-7-17">
        <title>Nonlinear activation function</title>
        <p>Adaptive optimization algorithm (Adam) used for
updating network weights
adaptive</p>
        <p>alpha
validation_fraction
n_iter_no_change</p>
      </sec>
      <sec id="sec-7-18">
        <title>Parameter kernel</title>
        <p>C
gamma
balanced</p>
      </sec>
      <sec id="sec-7-19">
        <title>True 0.1 1300 50</title>
      </sec>
      <sec id="sec-7-20">
        <title>Value rbf 1.0 scale</title>
      </sec>
      <sec id="sec-7-21">
        <title>Dynamically adjusts the learning rate based on validation error trends</title>
      </sec>
      <sec id="sec-7-22">
        <title>Initial learning rate for the optimizer</title>
      </sec>
      <sec id="sec-7-23">
        <title>L2 regularization parameter preventing overfitting</title>
      </sec>
      <sec id="sec-7-24">
        <title>Stops training when validation performance no longer improves</title>
      </sec>
      <sec id="sec-7-25">
        <title>Fraction of data reserved for validation during training</title>
      </sec>
      <sec id="sec-7-26">
        <title>Maximum number of training iterations.</title>
      </sec>
      <sec id="sec-7-27">
        <title>Number of epochs with no improvement before early stopping is triggered Description/Purpose</title>
      </sec>
      <sec id="sec-7-28">
        <title>Kernel type — Radial Basis Function (RBF) Regularization parameter controlling the trade-off between maximizing the margin and minimizing classification errors</title>
      </sec>
      <sec id="sec-7-29">
        <title>Kernel coefficient defining the influence radius of</title>
        <p>individual training samples; automatically scaled as
1 / (n_features × Var(X))</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>K. P.</given-names>
            <surname>Amber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ahmad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. W.</given-names>
            <surname>Aslam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kousar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Usman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Khan</surname>
          </string-name>
          ,
          <article-title>Intelligent techniques for forecasting electricity consumption of buildings</article-title>
          ,
          <source>Energy</source>
          <volume>157</volume>
          (
          <year>2018</year>
          )
          <fpage>886</fpage>
          -
          <lpage>893</lpage>
          . doi:
          <volume>10</volume>
          .1016/j.energy.
          <year>2018</year>
          .
          <volume>05</volume>
          .155.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Williams</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Short</surname>
          </string-name>
          ,
          <article-title>Electricity demand forecasting for decentralised energy management</article-title>
          ,
          <source>Energy Built Environ</source>
          .
          <volume>1</volume>
          (
          <issue>2</issue>
          ) (
          <year>2020</year>
          )
          <fpage>178</fpage>
          -
          <lpage>186</lpage>
          . doi:
          <volume>10</volume>
          .1016/j.enbenv.
          <year>2020</year>
          .
          <volume>01</volume>
          .001.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>D. B.</given-names>
            <surname>Avancini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. J. P. C.</given-names>
            <surname>Rodrigues</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. G. B.</given-names>
            <surname>Martins</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. A. L.</given-names>
            <surname>Rabêlo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Al-Muhtadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Solic</surname>
          </string-name>
          ,
          <article-title>Energy meters evolution in smart grids: A review</article-title>
          ,
          <source>J. Clean. Prod</source>
          .
          <volume>217</volume>
          (
          <year>2019</year>
          )
          <fpage>702</fpage>
          -
          <lpage>715</lpage>
          . doi:
          <volume>10</volume>
          .1016/j.jclepro.
          <year>2019</year>
          .
          <volume>01</volume>
          .229.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>G.</given-names>
            <surname>Dileep</surname>
          </string-name>
          ,
          <article-title>A survey on smart grid technologies and applications</article-title>
          ,
          <source>Renew. Energy</source>
          <volume>146</volume>
          (
          <year>2020</year>
          )
          <fpage>2589</fpage>
          -
          <lpage>2625</lpage>
          . doi:
          <volume>10</volume>
          .1016/j.renene.
          <year>2019</year>
          .
          <volume>08</volume>
          .092.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>D.</given-names>
            <surname>Rangel-Martinez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. D. P.</given-names>
            <surname>Nigam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. A.</given-names>
            <surname>Ricardez-Sandoval</surname>
          </string-name>
          ,
          <article-title>Machine learning on sustainable energy: A review and outlook on renewable energy systems, catalysis, smart grid and energy storage</article-title>
          ,
          <source>Chem. Eng. Res. Des</source>
          .
          <volume>174</volume>
          (
          <year>2021</year>
          )
          <fpage>414</fpage>
          -
          <lpage>441</lpage>
          . doi:
          <volume>10</volume>
          .1016/j.cherd.
          <year>2021</year>
          .
          <volume>08</volume>
          .013.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>T.</given-names>
            <surname>Ahmad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <article-title>A review on renewable energy and electricity requirement forecasting models for smart grid and buildings</article-title>
          ,
          <source>Sustain. Cities Soc</source>
          .
          <volume>55</volume>
          (
          <year>2020</year>
          )
          <article-title>102052</article-title>
          . doi:
          <volume>10</volume>
          .1016/j.scs.
          <year>2020</year>
          .
          <volume>102052</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Herasymiuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sverstiuk</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Kit</surname>
          </string-name>
          ,
          <article-title>Multifactor regression model for prediction of chronic rhinosinusitis recurrence</article-title>
          ,
          <source>Wiadomosci Lek</source>
          .
          <volume>76</volume>
          (
          <issue>5</issue>
          ) (
          <year>2023</year>
          )
          <fpage>928</fpage>
          -
          <lpage>935</lpage>
          . doi:
          <volume>10</volume>
          .36740/wlek202305106.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ahmed</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. M. Alshater</surname>
            ,
            <given-names>A. E.</given-names>
          </string-name>
          <string-name>
            <surname>Ammari</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Hammami</surname>
          </string-name>
          ,
          <article-title>Artificial intelligence and machine learning in finance: A bibliometric review</article-title>
          ,
          <source>Res. Int. Bus. Financ</source>
          .
          <volume>61</volume>
          (
          <year>2022</year>
          )
          <article-title>101646</article-title>
          . doi:
          <volume>10</volume>
          .1016/j.ribaf.
          <year>2022</year>
          .
          <volume>101646</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>D.</given-names>
            <surname>Tymoshchuk</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Didych</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Maruschak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Yasniy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mykytyshyn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mytnyk</surname>
          </string-name>
          ,
          <article-title>Machine learning approaches for classification of composite materials</article-title>
          ,
          <source>Modelling</source>
          <volume>6</volume>
          (
          <issue>4</issue>
          ) (
          <year>2025</year>
          )
          <article-title>118</article-title>
          . doi:
          <volume>10</volume>
          .3390/modelling6040118.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>