<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>A. Araveeporn, An estimating parameter of nonparametric regression model based on
smoothing techniques, Statistical Journal of the IAOS</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.1016/j.asoc.2021.107797</article-id>
      <title-group>
        <article-title>Training neural network method modification for forward error propagation based on adaptive components</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Victoria Vysotska</string-name>
          <email>victoria.a.vysotska@lpnu.ua</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vasyl Lytvyn</string-name>
          <email>vasyl.v.lytvyn@lpnu.ua</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mariia Nazarkevych</string-name>
          <email>mariia.a.nazarkevych@lpnu.ua</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Serhii Vladov</string-name>
          <email>serhii.vladov@univd.edu.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ruslan Yakovliev</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alexey Yurko</string-name>
          <email>yurkoalexe@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Kremenchuk Flight College of Kharkiv National University of Internal Affairs</institution>
          ,
          <addr-line>Peremohy Street 17/6 39605 Kremenchuk</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Kremenchuk Mykhailo Ostrohradskyi National University</institution>
          ,
          <addr-line>University Street 20 39600 Kremenchuk</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Lviv Polytechnic National University</institution>
          ,
          <addr-line>Stepan Bandera Street 12 79013 Lviv</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <volume>35</volume>
      <issue>2</issue>
      <fpage>19911</fpage>
      <lpage>19931</lpage>
      <abstract>
        <p>The work is devoted to the development of a training algorithm for forward propagation neural networks, based on the backpropagation algorithm, through the use of adaptive elements, such as adaptive training rate, adaptive initialization of neural network weights, adaptive regularization, adaptive neuron activation function, adaptive change in neural network architecture, adaptive mini-batch resizing. Using the example of solving the task of helicopter turboshaft engine parameters debugging, it is shown that the developed algorithm made it possible to achieve almost 100 % accuracy of neural network training on both the training and validation data sets with a minimum number of iterations. The work experimentally substantiates the optimal value of the training rate coefficient, the number of neurons in the hidden layer of the neural network, and the optimal number of iterations when training a neural network by determining the smallest value of the final total standard deviation per epoch. It has been established that the use of L2regularization in the developed method of training a feed-forward neural network with adaptive elements increases the regulation curve (or a similar dependence), increasing its values by the amount of regularization and bringing it closer to unity. This led to an improvement in the accuracy of setting the gas-generator rotor r.p.m. in the task of helicopter turboshaft engine parameters debugging by half compared to the use of the well-known Delta-Bar-Delta neural network training algorithm. Using the developed training algorithm for forward propagation neural networks with adaptive elements reduces the error coefficient by 1.89 times and slightly increases the accuracy of determining gas-generator rotor r.p.m. boundary values by 1.01 times, compared to the DeltaBar-Delta algorithm, in helicopter turboshaft engines parameter debugging.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Neural network</kwd>
        <kwd>helicopter turboshaft engines</kwd>
        <kwd>training algorithm</kwd>
        <kwd>parameters debugging</kwd>
        <kwd>adaptive elements</kwd>
        <kwd>adaptive training rate</kwd>
        <kwd>gas-generator rotor r</kwd>
        <kwd>p</kwd>
        <kwd>m</kwd>
        <kwd />
        <kwd>L2-regularization 1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Feedforward neural networks are one of the most widely used classes of artificial neural
networks. They comprise neurons organized into layers, with each neuron connected to
neurons in the next layer. Direct propagation means that signals are transmitted in only one
direction, from input nodes to output units [1, 2].</p>
      <p>In feedforward neural networks, adaptive elements play a key role. These elements allow
the network to train from the data provided and adapt its weights and parameters to
achieve the desired output. One of the most common methods for adapting elements in
neural networks is the backpropagation algorithm, which uses gradient descent to adjust
the weights [3, 4].</p>
      <p>Development of a neural network begins with defining its architecture, which includes
the number of layers, the number of neurons in each layer, and the choice of activation
functions. Then it is necessary to initialize the neuron weights with random values. The
training process involves passing data forward through the network (forward propagation),
estimating the error between the predicted and expected output, and then backpropagating
the error to adjust the weights using gradient descent. Once training is completed, the
network is tested on a separate dataset to evaluate its performance. This process is repeated
until a satisfactory level of neural network performance is achieved [5, 6].</p>
      <p>Important aspects of neural network development are the correct choice of network
architecture, optimization of training parameters, and accurate data processing.
Feedforward neural networks with adaptive elements provide a powerful tool for modeling
complex relations in data and solving a variety of tasks in the fields of machine learning and
artificial intelligence [7, 8].</p>
      <p>A critical drawback of the element adaptation method in feedforward neural networks,
namely the backpropagation algorithm, is its tendency to get stuck in local minima and
saddle points of the loss function, especially in the case of complex and non-smooth
functions. This can limit the network's ability to reach an optimal solution and slow down
the training process, requiring careful selection of hyperparameters and the use of
additional methods to avoid getting stuck [9, 10].</p>
      <p>The work aim is to research and develop new methods for optimizing the
backpropagation algorithm in feedforward neural networks to improve its resistance to
getting stuck in local minima and saddle points of the loss function. This includes analyzing
problem situations, developing new gradient optimization methods and algorithms, and
experimentally testing and comparing their effectiveness on different datasets and network
architectures. The result should be innovative approaches that can increase the speed and
accuracy of neural network training, reduce the likelihood of getting stuck in local minima,
and provide more stable convergence to the optimal solution.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related works</title>
      <p>It is known that a feed-forward neural network consists of interacting adaptive elements
called neurons, each of which carries out a certain functional transformation of input signals
[11, 12].</p>
      <p>In [13] the first proposed to represent the error backpropagation process using a
functional diagram known as a system backpropagation diagram. This diagram serves as a
visual tool to explain the operation of the backpropagation algorithm. The authors use it as
an aid to simplify the derivation of necessary expressions when analyzing dynamic neural
networks designed to process time-dependent signals. This method has also been used by
other authors, for example in [14, 15], as a visual way to represent backpropagation rules
when studying neural networks.</p>
      <p>In [16], the approach proposed in [13] was expanded and streamlined by constructing a
neural network based on adaptive components, which must remain independent of each
other during the construction of a mathematical model of the network. Bidirectional
connections are established between the components, forming two combined graphs to
describe the transmission of signals in both directions. Each component performs signal
processing in both forward and backward directions and also adjusts its adaptive
parameters during training using the Delta-Bar-Delta method [17]. Unlike gradient descent
and torque, the main difference in this method is that each adaptive parameter is assigned
its training rate coefficient. At the end of each training epoch, both the adaptable parameters
and the training rate coefficient are corrected.</p>
      <p>A critical disadvantage [16, 17] is the increased complexity of model control and tuning
due to the need to track and adjust individual training rate coefficients for each adaptive
parameter. This requires additional computational resources and time to conduct training
since each parameter must be separately configured according to the training dynamics,
which can slow down the process and complicate network configuration. In addition, there
is an increased likelihood of incorrectly selecting training rate coefficients, which can lead
to instability and poor model performance.</p>
      <p>Thus, the relevance of the research is emphasized by the need to overcome the
difficulties associated with managing and tuning neural networks due to the increased
complexity of adaptive parameters that require individual adjustment of training rate
coefficients. This limits the training efficiency and stability of models, increasing the
likelihood of instability and slower training. In the context of the desire to improve the
performance and accuracy of neural networks, the development of new optimization
methods is becoming an urgent task aimed at improving the stability of training, reducing
setup time, and increasing the stability of models when converging to the optimal solution.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Methods and materials</title>
      <p>One possible optimal adaptive element to improve the backpropagation algorithm could be
the “Adaptive Training Rate” (ATR). This element will dynamically change the training rate
depending on the gradients obtained at each training step (Table 1). The paper proposes an
algorithm for training a forward propagation neural network using an adaptive element in
the form of an "Adaptive Training Rate" by combining the backpropagation algorithm with
ATR.</p>
      <p>ATR allows the training rate to be adapted at each step based on
of training speed</p>
      <p>gradient information. If the gradients are small, which could</p>
      <sec id="sec-3-1">
        <title>Quick adaptation to changing conditions</title>
      </sec>
      <sec id="sec-3-2">
        <title>Preventing divergence and stability efficiency</title>
      </sec>
      <sec id="sec-3-3">
        <title>Conclusion</title>
        <p>increasing training
weights change. This provides
more stable training and</p>
      </sec>
      <sec id="sec-3-4">
        <title>Improving training</title>
        <p>ATR allows for more efficient use of training resources because
indicate that the network is near a local minimum or saddle
point, ATR will automatically reduce the training rate to prevent
the weights from changing too much and possibly getting stuck
at local minima or saddle points.</p>
        <p>ATR allows you to quickly adapt to changes in data structure or
task complexity. For example, if some model parameters require
more intensive training, ATR can increase the training rate for
those parameters, providing more efficient training.</p>
        <p>An adaptive training rate can help prevent the backpropagation
algorithm from diverging by controlling the rate at which the
improves the overall convergence of the neural network.
it allows the training rate to be tailored to the specific conditions
of each training step, reducing the likelihood of overfitting and
accelerating convergence to the optimal solution.</p>
        <p>The introduction of an adaptive element in the form of an
"Adaptive Training Rate" can significantly improve the training
process of neural networks, making it more stable, efficient, and
resistant to various conditions and problems associated with the
backpropagation algorithm.</p>
        <p>At the initial stage, adaptive initialization of the neural network weights is carried out by
calculating the average value of the input data and the dispersion of the input data according
to the expressions:</p>
        <p>∙ ∑   ,</p>
        <p>
          =1
 =

1

1

 =1
 2 =
∙ ∑(  −  )2,
(
          <xref ref-type="bibr" rid="ref1">1</xref>
          )
(
          <xref ref-type="bibr" rid="ref2">2</xref>
          )
where N is the number of training examples, xi is the input data.
        </p>
        <p>Using weight initialization methods (for example, the He's method [18] or Xavier [19]),
the initial values of the weights are set, taking into account the obtained statistical
characteristics of the input data (Table 2).
,
√ 
√6
+</p>
        <p>),
where U(a, b) is a uniform distribution on
the interval [a, b], nout is the number of
output neurons.</p>
        <p>Let</p>
        <p>( ) be the weight connecting the i-th neuron in the l-th layer with the j-th neuron
in the next (l + 1)-th layer. For each training example x, the output  ̂ of the neural network
is calculated according to the expressions:
 ( ) =  ( )∙  ( −1)+  ( ),
 ( ) =  ( ( )),
1
2

 =1
 =</p>
        <p>∙ ∑(  −  ̂ )2.
 ( )
=
where z(l) is the weighted sum of inputs for the i-th layer, a(l) is the activation of the l-th layer,
σ is the activation function of the l-th layer.</p>
        <p>Next, the error of the neural network is estimated using the loss function L and the
expected value of y according to the expression:</p>
        <p>Next, the gradient of the loss function is calculated according to the neural network
weights according to the expressions:
where δ(l) is the error on the lth layer, ⊙ denotes element-wise multiplication</p>
        <p>After calculating the gradient of the loss function from the neural network weights, the
weights are updated taking into account the gradient and the adaptive training rate
according to the expressions:
 ( ) =  ( )−  ( )∙
 ( ) =  ( )−  ( )∙ 
where α(l) is the adaptive training rate for the l-th layer.</p>
        <p>
          In this case, the training rate at each step is updated according to the expression:
(
          <xref ref-type="bibr" rid="ref3">3</xref>
          )
(4)
(5)
(6)
(7)
(8)
(9)
 ( ) =
where α0 is the initial training rate, β is the adaptation coefficient, ‖∇ ( )‖2 is the squared
norm of the gradient, L(θ) is the loss function, θ is the model parameters vector.
        </p>
        <p>To control the retraining of the neural network, adaptive regularization is introduced
into the proposed training algorithm. Overfitting occurs when a model overfits the training
data and begins to lose its ability to generalize to new, previously unseen data. Adaptive
regularization allows you to dynamically adjust the level of regularization during training
depending on the current state of the network, which can improve its generalization ability
and prevent overfitting [20, 21]. For a given training algorithm that already includes
adaptive training rate and other gradient control techniques, L2 regularization may be
preferable to Dropout as it effectively controls overfitting by penalizing large weights while
keeping all neurons active during training. L2-regularization for a loss function L(θ) with
weights W and regularization coefficient λ is defined as:
(10)
(11)
(12)
 2 =  +</p>
        <p>∙ ∑‖ ( )‖2,
performance on the validation dataset. The value can range from 10−6 to 10−2 depending on
the size of the data set and the complexity of the model. Thus, the initial value for the
constant const can be chosen, for example, equal to 10−4, and then adjusted during the
training process depending on the effectiveness of regularization and preventing
overfitting.</p>
        <p>To improve the resistance of the training algorithm to getting stuck in local minima and
saddle points of the loss function, it is advisable to use a loss function, which contributes to
smoother and more predictable optimization. One option would be to use a smooth loss
function such as cross-entropy [22, 23] for classification tasks, and mean squared error for
regression tasks [24, 25]. In addition, you can consider using a loss function that takes into
account the distribution of the data and penalizes large deviations of the predicted values
from the actual values, for example, the Huber loss function [26] or the K-quantile loss
function [27].</p>
        <p>A smooth loss function allows for smoother gradient changes and helps avoid sharp
jumps, which can lead to better convergence to a global minimum and prevent getting stuck
at local minima and saddle points. Choosing a smooth loss function allows the training
algorithm to adapt to different types of problems and data, allowing the neural network to
training more efficiently while minimizing the risk of getting stuck in local minima or saddle
points.</p>
        <p>Loss functions such as Huber or K-quantile take into account the data distribution and
impose a more balanced error penalty without allowing large variations in the value of the
loss function, resulting in more stable optimization. However, a key disadvantage of Huber
or K-quantile functions over a smooth loss function is their less smooth nature, which can
lead to more complex optimization and slower neural network training.</p>
        <p>One smooth loss function that is used here is a smooth version of the mean squared error
known as Smooth Mean Squared Error (SMSE) [28], which uses a smooth loss function
instead of the squared difference between the predicted and actual output. The SMSE
analytical expression looks like this:</p>
        <p>1
where smooth(  −  ̂ )is a smooth function that replaces the absolute value in the squared
error.</p>
        <p>Application (13) allows us to improve the resistance of the training algorithm to getting
stuck in local minima and saddle points of the loss function, since the smooth function
smooth(  −  ̂ ) ensures a smooth change in the gradient even in the vicinity of points
where the loss function has sharp changes. This avoids sudden jumps and allows gradient
descent to more efficiently find paths to the global minimum of the loss function, improving
the overall convergence of the training algorithm and preventing it from getting stuck at
local minima or saddle points.</p>
        <p>Thus, the squared norm of the gradient is defined as:
 ‖∇ ( )‖2 = ∑ (

 =1
  ( ,  ̂) 2</p>
        <p>) ,
  
where
θi.</p>
        <p>( ,̂)is the partial derivative of the loss function L with respect to the i-th parameter
The adaptation coefficient for the proposed training algorithm is defined as:
 =</p>
        <p>0
1 +  ∙ ‖∇ ( )‖2
,
where β0 is the initial value of the adaptation coefficient, γ is the adaptation coefficient for
the adaptation coefficient.</p>
        <p>The initial value of the adaptation coefficient β0 and the adaptation coefficient for the
adaptation coefficient γ are usually set at the initialization stage of the training algorithm.
(13)
(14)
(15)
They are hyperparameters that are selected experimentally or using optimization
techniques such as cross-validation.</p>
        <p>A small positive number, for example, 0.1 or 0.01, is usually selected as the initial value
of the adaptation coefficient β0. This initial value determines how quickly training rate
adaptation will begin. The lower the value, the faster adaptation will begin. The adaptation
factor for the adaptation factor is also chosen experimentally and depends on the specific
task and network architecture. Typically, it is selected in the range from 0.9 to 0.999. This
coefficient controls the adaptation speed of the adaptation coefficient itself: the closer to 1,
the slower the adaptation occurs.</p>
        <p>In the proposed training algorithm, it is important to select an adaptive activation
function for the l-th layer, which will ensure stable and efficient transfer of gradients during
backpropagation. Given this goal, it is advisable to choose an activation function that has a
smooth gradient and reduces the likelihood of gradients decaying or exploding in deep
networks. Activation functions such as Mish, Swish or ELiSH [29, 30] may be preferable as
they not only provide a smooth gradient but also show high efficiency in optimizing and
generalizing neural network models. This choice of activation function is important to
ensure the stability and speed of convergence of the training algorithm, which in turn helps
to achieve better results in practice.</p>
        <p>From these activation functions (Mish, Swish and ELiSH), it is advisable to select the Mish
function for the proposed training algorithm. The Mish function is a smooth and
continuously differentiable function that has good ability to adapt to different data and
reduce the likelihood of gradients decaying during backpropagation. Due to its shape and
unique properties, Mish demonstrates high efficiency in both optimization and
generalization of neural network models. Its use in this algorithm promotes more stable
and efficient training, which can ultimately lead to better results in practice. The adaptive
activation function Mish is described by the expression:
where x is the input signal, tanh is the hyperbolic tangent, softplus is the softplus activation
function, defined as softplus( )= ln(1 +   ).</p>
        <p>Thus, the adaptive Mish function is a combination of a linear function x and a hyperbolic
tangent, which provides smoothness and continuous differentiability while maintaining
useful activation properties.</p>
        <p>Adding adaptive training rate variation over time helps improve the stability and
training rate of the model, which in turn can lead to higher quality and more efficient
training. To add an adaptive change in the training rate over time in this algorithm, you can
use methods such as Learning Rate Schedulers or Learning Rate Decay (Table 3) [31, 32].</p>
        <p>Learning Rate Schedulers allow you to dynamically change the training rate during
training depending on a specific schedule. For example, you can start with a higher training
rate and gradually decrease it as you progress in training or after a certain number of
epochs. This approach allows you to better adjust the training rate in accordance with the
training progress and the dynamics of changes in gradients.</p>
        <p>Learning Rate Decay involves reducing the training rate after each epoch or a certain
number of training steps. This can be implemented by multiplying the current training rate
by a factor that decreases over time or with each epoch.</p>
        <p>For example, after each epoch, you can reduce the training rate by a fixed percentage or
multiply it by a coefficient that depends on the quality indicator of the model on the
validation data set.
where αnew represents the new value of
parameter α, αold is the current value of where max(
parameter α, "factor" is the constant training epochs.
multiplier by which the parameter is
adjusted, "epoch" refers to the current
iteration or epoch in the process, "step
size" is the number of epochs after which
the parameter is updated.</p>
        <p>Exponential Decay:
 
=  0 ∙
1 + cos ( ∙ ma x(
2
ℎ
ℎ))</p>
        <p>,
ℎ)is the total number of
 
=  
∙  −</p>
        <p>ℎ,
where “decay rate” is the decay coefficient
that determines the rate at which the
training rate decreases with each epoch.</p>
        <p>The use of adaptive modification of the neural network architecture in the proposed
training algorithm can help improve the efficiency of the model by optimizing its structure
during the training process. This allows the model to adapt more quickly and accurately to
changing task conditions and requirements, which can ultimately lead to higher
performance and generalization ability. To adaptively change the architecture of a neural
network, automatic architecture differentiation (AutoML) is proposed, which allows the
structure of the neural network to be optimized during the training process using
optimization algorithms such as gradient descent. A neural network can automatically
change its architecture by adding or removing layers, adjusting their parameters, etc. to
improve performance based on training data [33, 34].</p>
        <p>To optimize the neural network architecture, an optimization algorithm is used, for
example, gradient descent, according to which the task of optimizing the neural network
architecture is represented as:
 ∗ = argmin  ( ),
(17)
where θ∗ are the optimal parameters of the model.</p>
        <p>To calculate gradients based on the model parameters, the backpropagation algorithm
is used, which calculates the gradients of the loss function based on the network parameters
∇  ( ). In the case of AutoML, gradients can also be calculated from model
hyperparameters such as number of layers, number of neurons, etc. This allows us to
optimize the network architecture during the training process. Hyperparameter gradients
can be computed using hyperparameter differentiation methods or approximate methods
such as REINFORCE or gradient backpropagation time (TBPTT) algorithms. After
computing the gradients across the model's parameters and hyperparameters, we can use
an optimization algorithm such as stochastic gradient descent (SGD) to update the
parameters and hyperparameters according to the resulting gradients. These steps form the
basis of the automatic architecture differentiation algorithm (AutoML), which allows a
neural network to change its structure during training to optimize its performance and
generalization ability.</p>
        <p>The use of adaptive mini-batch resizing allows you to more flexibly manage the training
process and improve its efficiency. For example, if a model faces the problem of rapidly
changing gradients or computational inefficiency, increasing the mini-batch size can help
smooth out gradients and speed up training. Conversely, reducing the mini-batch size can
be useful to improve the generalization ability of the model or improve convergence in case
of overfitting [35]. Mathematically, the adaptive change in the mini-batch size is
implemented according to the expression:
 
= ⌊ 
∙  ⌋,
(18)
where Nold is the current mini-batch size, Nnew is the new mini-batch size, η is the adaptation
coefficient, ⌊∙⌋ is the rounding down function.</p>
        <p>The adaptation coefficient η is selected based on certain criteria or conditions. For
example, you can choose η such that the new mini-batch size increases or decreases
depending on the rate of model convergence or the dynamics of the gradients.</p>
        <p>Once the new mini-batch size is calculated, it is applied to the next iteration of model
training. A new mini-batch is formed from training examples taking into account the new
size.</p>
        <p>The proposed algorithm for training feedforward neural networks allowed us to
formulate the following theorem: training algorithm for a feedforward neural network with
adaptive initialization of weights, adaptive training rate, adaptive regularization, smooth
loss function, adaptive activation function, adaptive change in training rate over time,
adaptive change in neural network architecture and adaptively changing the mini-batch size
converges to an optimal solution to the training task with probability 1 if the following
conditions are met:</p>
        <p>1. Limited training set: the training data set X consists of N independent and
identically distributed examples, where N → ∞.</p>
        <p>2. Boundedness of the parameter space: the parameter space Θ of the model is
limited by the compact set K ⊂ ℝd, where d is the dimension of the parameter space.</p>
        <p>3. Smoothness of the loss function: the loss function L(θ) is twice continuously
differentiable on K.</p>
        <p>4. Convexity of the loss function: the loss function L(θ) is convex on K.
5. Strong convexity of the loss function: the loss function L(θ) is strongly
convex on K with a strong convexity constant m &gt; 0.</p>
        <p>6. Training rate adaptability: the training rate α(t) adapts over time in such a
way that it satisfies the following conditions:  ( )&gt; 0∀ &gt; 0, ∑∞=1  ( )= ∞,
∞ 2
∑ =1( ( )) &lt; ∞.</p>
        <p>7. Adaptability of regularization: the regularization coefficient λ adapts over
time in such a way that it satisfies the following condition: 0 &lt;  ( )&lt;  max∀ &gt; 0.</p>
        <p>8. Adaptability of the activation function: the activation function σ(x) is
continuously differentiable and monotonically increasing.</p>
        <p>9. Adaptability of mini-batch size: The mini-batch size N(t) adapts over time in
such a way that it satisfies the following condition:  min &lt;  ( )&lt;  max∀ &gt; 0.</p>
        <p>Proof of theorem. To prove this theorem, the stochastic gradient descent (SGD) method
is used in combination with parameters that adaptively change over time by specified
conditions. Let the loss function L(θ) be given, where θ are the parameters of the neural
network model. The aim is to minimize the loss function L(θ). For this, SGD is used, which
updates the parameters as   +1 −   −  ( )∙ ∇ (  ), where α(t) is the training rate at step
t, ∇ (  ) is the gradient of the function losses in terms of parameters θ at step t. This
approach is generalized taking into account adaptive parameters: adaptive initialization of
weights is the initialization of neural network weights randomly, but taking into account
the size of the input layer and the number of neurons in the next layer; adaptive training
rate α(t) – the sequence α(t) is used, which satisfies the adaptability conditions; adaptive
regularization λ(t) is a sequence λ(t) is used that satisfies the adaptivity conditions; adaptive
activation function is a continuously differentiable and monotonically increasing activation
function is used; adaptive change in the size of the mini-batch N(t) is the sequence N(t) is
used, which satisfies the adaptivity conditions. When N → ∞, the training set covers the
entire data space, which allows the algorithm to train from a variety of examples, which
determines the boundedness of the training set. The compact parameter space ensures that
changes in the model parameters are limited, which is important for the convergence of the
algorithm. A doubly continuously differentiable loss function ensures a smooth loss surface,
which simplifies optimization, while a convex loss function ensures that the global
minimum is unique and achievable, but strong convexity ensures that the algorithm quickly
converges to a global minimum.</p>
        <p>The convergence of the algorithm to the optimal solution is ensured by the convergence
of gradient descent and adaptive parameters. Provided that α(t) &gt; 0 for all t &gt; 0 and
∞ ∞ 2
∑ =1  ( )= ∞, as well as ∑ =1( ( )) &lt; ∞, gradient descent converges to a local minimum
of the loss function L(θ) with probability 1 under the conditions of smoothness and
convexity of L(θ). By adaptively changing the training rate α(t) and the regularization
coefficient λ(t) by the conditions of the algorithm, these parameters can adapt to the
characteristics of the loss function and ensure stable convergence of the algorithm.</p>
        <p>Thus, by applying the stochastic gradient descent method to the loss function L(θ) with
adaptive training and regularization parameters, taking into account constraints on the data
and model parameters, the algorithm converges to the optimal solution with probability 1.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiment</title>
      <p>The proposed algorithm for training a feedforward neural network with many adaptive
components finds wide practical applications in various fields of machine learning and
artificial intelligence. For example, in image processing, it can be used to train a neural
network to recognize objects in images with high accuracy, thanks to a smooth loss function
and an adaptive activation function, allowing it to efficiently process different types of data
and situations. Adaptive initialization of weights and training rates ensures fast model
convergence, adaptive regularization helps avoid overfitting. In addition, adaptive changes
in the architecture and size of the mini-batch allow you to optimize the training process by
the requirements of a specific task and the available computing resources. This approach
can be successfully applied in the fields of computer vision, natural language processing,
medical data analysis, and others where precise adaptation of the model to a variety of
conditions and data is required [36–40].</p>
      <p>In [41], the use of direct propagation neural networks in the problem of debugging the
parameters of helicopter turboshaft engines (TE) is shown, which is based on the use of a
universal mathematical model for debugging the parameters of a helicopter TE and the
operating algorithm of the control device (Fig. 1), which leads to the elimination of
inconsistencies that calculated for each engine control element.</p>
      <p>GTE model along the un
r.p.m control loop</p>
      <p>Model of an electronic governor
along the r.p.m. control loop</p>
      <p>I</p>
      <p>Using a universal approach, which is based on the use of Lyapunov functions, in [41]
universal tuning equations were obtained:</p>
      <p>̇ =  1 ∙ | ∙  ( )| or   = ∫  1 ∙ | ∙  ( )|  ,</p>
      <p>̇ =  1 ∙ | ∙  ( )| or   = ∫  1 ∙ | ∙  ( )|  ,

 ̇ =  2 ∙ | ∙  (  )| or   = ∫  2 ∙ | ∙  (  )|  ,</p>
      <p>̇ =  2 ∙ | ∙  ( )| or   = ∫  2 ∙ |∙  ( )|  ,
where AM, BM, CM, DM are the tunable coefficients are equal, after the end of the identification
process, to the coefficients of the equations describing the fuel dispenser, Ψ(α), Φ(I), Q(GT);
U(α) are the nonlinear functions, ε1, ε2 are the residual signals, K, L, M, N are the positive
definite diagonal matrices of given constant coefficients [41].</p>
      <p>The identified values of the coefficients AM, BM, CM, DM, which describe a real fuel
dispenser, are compared with the values AE, BE, CE, DE of the reference model of the
dispenser. Signals of differences between identified and reference coefficients 
  , 
=  
−   , 
=  
−   ,</p>
      <p>=   −   are used to debug the fuel dispenser. The
amount of movement of the actuators is determined by the sensitivity of the fuel dispenser
=   −
to the movement of the engine control element.</p>
      <p>To demonstrate the use of a feedforward neural network using adaptive elements to
solve the task of helicopter TE parameters debugging at flight modes, a two-dimensional
classification scenario was researched in [41], which consists in the fact that one of two
random narrow-band processes is observed using a quadrature demodulator. In this case,
the probability density function of each of these processes is described by the following
expression:</p>
      <p>1
√2 ∙ 
 ( ,  )=
∙ exp(− (
( −   )2
2 ∙   2
+
( −   )
2 ∙   2
2
)),
(19)
(20)
(21)
(22)
(23)
where σI, σQ are the dispersions, mI, mQ are the mathematical expectations of components I
and Q, I corresponds to the values of the gas-generator rotor r.p.m. nTC, Q corresponds to the
values of specific fuel consumption Ce.</p>
      <p>As a solution to this task, in [41] the data distribution area of two classes (I and Q) and
boundary lines at levels 0.1, 0.5, 0.9 were obtained, which shows the permissible and
unacceptable values of the gas-generator rotor r.p.m. nTC according to the specific fuel
consumption Ce.</p>
      <p>In this work, by conducting a corresponding computational experiment, it is proposed to
solve the same problem with a feed-forward neural network, while applying the proposed
training algorithm. To conduct the computational experiment, a personal computer was
used, AMD Ryzen 5 5600 processor, 32 KB third-level cache, Zen 3 architecture, 6 cores, 12
threads, 3.5 GHz, RAM – 32 GB DDR-4.</p>
      <p>To solve the task of helicopter TE parameters debugging (on the example of TV3-117
turboshaft engine), as a training sample. We will use the values of the gas generator rotor
r.p.m. nTC at the takeoff mode, reduced to absolute values [41, 42], given in Table 4, and the
parameters of the average engine fleet the next:  ̅ = 0.994,  ̅ = 0.977.</p>
      <p>In the input signal approximation task, according to [41], the dependence of the specific
fuel consumption Ce on the gas generator rotor r.p.m. nTC for the TV3-117 turboshaft engine
(which represents an element of the engine throttle characteristic) is presented. Fig. 2
shows the input data, indicated by points, which are approximated by broken lines for
clarity.</p>
      <p>At the stage of training sample pre-processing, its homogeneity is checked, divided into
control and test samples, as well as an assessment of their representativeness using cluster
analysis. To assess the homogeneity of the training set, the calculation of the Fisher-Pearson
criterion [43] is used based on the observed frequencies and comparison with the critical
values of χ2 with the number of degrees of freedom r – k –1 = 13 and the significance level
α = 0.01. This allows us to determine when statistical significance is accepted only if the
probability of obtaining these or more extreme results given the null hypothesis is less than
1 %.</p>
      <p>The resulting value χ2 = 18.388 does not exceed the critical value of 30.577, which
confirms the consistency of the samples and the hypothesis of normal distribution.</p>
      <p>To confirm homogeneity, the Fisher-Snedecor [44] criterion is adopted, which is the
ratio of the values of the larger and smaller dispersion with degrees of freedom r – k –1 =
13 and the significance level α = 0.01.</p>
      <p>The resulting value of F = 3.393 does not exceed the critical value of 3.61, which confirms
the consistency of the samples and the hypothesis of normal distribution.</p>
      <p>The representativeness of the training and test samples was assessed using cluster
analysis, the aim of which is to divide the set of input data X (Table 4) into k disjoint clusters,
where k is a predetermined number of clusters. Each cluster is a group of objects that are
considered more similar to each other than to objects from other clusters. The work uses
the k-means cluster analysis method, which is based on minimizing the sum of squared
distances between cluster objects and their centroids. Each object xi of set X is assigned to
the nearest centroid according to   = argmi n‖  −   ‖2, where μj are the initial centroids,
‖  −   ‖2 is the Euclidean distance between object xi and centroid μj. After this, the
centroids are recalculated as the average value of objects within each cluster according to
  = | 1 | ∙ ∑  ∈    , where |  | is the number of objects in the j-th cluster. The calculations
of Ci and μj are repeated until changes in the cluster distribution are minimal. The algorithm
terminates when none of the centroids changes significantly or the specified number of
iterations is completed [45]. The results of the cluster analysis of the training sample data
(Table 4) identified 8 classes (classes I…VIII). After random selection, training and test
samples were compiled in a 2:1 ratio (67 and 33 %, respectively). The cluster analysis of
both samples revealed the presence of eight groups in them, which indicates the similarity
of the composition of both training and test samples. The distances between groups are
almost the same in both samples, which confirms the similarity of their composition (Fig. 3).
Thus, the optimal sample size was obtained: training – 256 elements (100 %), control – 172
elements (67 % of the training sample), test – 84 elements (33 % of the training sample).</p>
      <p>As part of the computational experiment, a forward propagation neural network was
used (Fig. 4), the inputs of which are the parameters of the gas generator rotor r.p.m. nTC
and specific fuel consumption Ce, and the outputs are their optimal values nTCopt and Ceopt.
During its training with the proposed algorithm, the dependences of the accuracy (Fig. 5)
and losses (Fig. 6) of the neural network on the number of iterations (100 iterations were
used in the work) were obtained, in which the “blue curve” means training on the training
sample, the “orange curve” means validation on a control sample. From Fig. 5 it can be seen
that the limiting value of accuracy reaches 1, and from Fig. 6 shows that the maximum loss
value does not exceed 0.025. This indicates a high degree of efficiency in training the model
on the provided data and the ability of the model to generalize to new data with high
accuracy, which makes it potentially suitable for solving the task of helicopter TE parameter
debugging.</p>
      <p>nTС
Ce
(author's research).
(author's research).</p>
      <p>In this case, the loss function was determined according to (13), and the accuracy
function – according to the expression:

=

1

 =1
∙ ∑  ( ,  ̂),
(24)
where N is the total number of examples, yi is the true value of the target variable for the i-th
example,  ̂ is the predicted value of the target variable for the i-th example,  ( ,  ̂)is an
indicator function that returns 1 if the predicted value matches with true  =  ̂, and 0
otherwise.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Results</title>
      <p>The results of the computational experiment are both partial researches of the proposed
neural network training algorithm, and boundary lines at levels 0.1, 0.5, 0.9, which shows the
permissible and unacceptable values of the gas-generator rotor r.p.m. nTC according to the
specific fuel consumption Ce, which must be compared with the corresponding results
obtained in [41].</p>
      <p>The adequacy of the resulting diagram of the area of distribution of data of two classes
(I and Q), reconstructed by a neural network, directly depends on the training process.
According to [46], a number of parameters are identified that affect the quality of training:
training rate coefficient (assumed 10−4); number of neurons in the hidden layer
(assumed 10); number of training epochs completed (assuming 100 training epochs).</p>
      <p>As a criterion for assessing the quality of training, the final total standard deviation for
the epoch was used, which is determined according to the expression:

1

 =1
1
2</p>
      <p>=1
 
ℎ =
∙ ∑ ( ∙ ∑(  −  ̂ )2) .
(25)</p>
      <p>The results of the researches are given in Table 5–7 and in Fig. 7–9, where: Fig. 7 –
diagram determining the influence of the training rate on the final standard deviation; Fig.
8 – diagram determining the influence of the number of hidden neurons on the final
standard deviation; Fig. 9 – diagram determining the influence of the number of epochs
passed on the final standard deviation.
Influence of the training rate coefficient on the resulting error (author's research)
Number</p>
      <p>Training rate coefficient</p>
      <p>Final standard deviation
1
2
3
4
5
6
7
8
9
0.0005
0.001</p>
      <p>From the results obtained it follows that the minimum final total standard deviations per
epoch were obtained with the optimal value of the training rate coefficient being 10−4 and
10 neurons in the hidden layer. It is worth noting that in [41], the optimal number of
neurons in the hidden layer is 3. Increasing the number of neurons in the hidden layer from
3 to 10 leads to a noticeable improvement in the generalization ability of the model and a
reduction in the risk of overfitting. Increasing the number of neurons to 10 allows the model
to more flexibly adapt to complex relations in the data, which helps improve the accuracy
of predictions on new, previously unseen data. This is because more neurons allow the
model to training more complex features and data structures, which is especially important
in the case of high-dimensional and complex data. Thus, increasing the number of neurons
to 10 in the hidden layer is a promising step to improve the quality of the neural network.</p>
      <p>It is also worth noting that, starting from 100 training epochs, the minimum final total
standard deviation is minimal and constant – 3.358, which indicates that the model has
achieved optimal accuracy on this data set and further training does not lead to a significant
improvement in results. This may indicate that the model has trained to predict the target
variable with high accuracy and additional training epochs do not bring a significant increase
in the quality of predictions. Thus, a constant value of the minimum total standard deviation
after 100 epochs indicates the convergence of the model and its readiness to be used for
solving practical tasks. Thus, the proposed forward propagation neural network for solving
the task of helicopter TE parameters debugging (Fig. 4) is transformed into the form
presented in Fig. 10.</p>
      <p>w1(11)
w01</p>
      <p>w1(21)
nTС
Ce</p>
      <p>At the next stage of the computational experiment, the control curve   =  ( ̅ ) is
researched, which, according to [41], is presented in the form:
  ( ̅ )= 0.0016 ∙  4 − 0.0195 ∙  3 + 0.0864 ∙  2 − 0.1774 ∙   + 0.4083,
(26)
where  ̅ =   is the relative value of the gas-generator rotor r.p.m. nTC.</p>
      <p>max</p>
      <p>Fig. 11 shows a diagram of dependence of the objective function   ( ̅ )→ minfrom the
of the gas generator rotor r.p.m nTC value, where “blue curve” shows the original
dependence obtained in [41], “orange curve” shows the dependence obtained in this work
using L2-regularization (11). In this case, the objective function will have an updated form:
(27)
(28)
or
  ( ̅ ) 2 =   ( ̅ )+ ( +
2 ∙ 
+ ( +</p>
      <p>
        ∙ ( (
        <xref ref-type="bibr" rid="ref1">1</xref>
        )+  (
        <xref ref-type="bibr" rid="ref2">2</xref>
        )+  (
        <xref ref-type="bibr" rid="ref3">3</xref>
        )+  (4)+  (5))),
where W(
        <xref ref-type="bibr" rid="ref1">1</xref>
        ), W(
        <xref ref-type="bibr" rid="ref2">2</xref>
        ), W(
        <xref ref-type="bibr" rid="ref3">3</xref>
        ), W(4), W(5) are the model weights corresponding to each of the five
terms in the original function   ( ̅ )(26).
r.p.m. value. (author's research).
      </p>
      <p>As can be seen from Fig. 11, adding L2-regularization to the objective function made it
possible to raise the adjustment curve   =  ( ̅
) up by the regularization value, bringing
it closer to 1, by adding to the original one function that increases its values. This allows the
model to more effectively take into account the complexity of the data and reduce the risk
of overfitting, due to a penalty for large values of the weighting coefficients. A raised curve
provides a more stable and robust optimization of the model, which can lead to improved
generalization ability and predictive accuracy on new data. In this case, objective function
minimum 0.40 is reached at the value r.p.m. 0.992. Thus, the correction of the mean value
of nTС by  
 = 0.994 − 0.991 = 0.003, while the value  
= 0.006 obtained in
[41]. Thus, the addition of L2-regularization made it possible to more accurately (2 times
compared to [41]) adjust the gas generator rotor r.p.m nTC value and bring it closer to the
average value for the engine fleet  ̅ = 0.994.</p>
      <p>The results obtained made it possible to obtain a refined area of distribution of data of
two classes (I and Q) with boundary values of nTC, respectively, lines at levels 0.1, 0.5, 0.9
(Fig. 12).</p>
      <p>Fig. 12 allows you to determine the areas in which each of the classes is most likely to be
found. Refined limit values of nTC on lines at levels 0.1, 0.5, 0.9 make it possible to more
accurately determine the optimal gas generator rotor r.p.m nTC values to achieve the
required levels of specific fuel consumption Ce. As can be seen from Fig. 12, the region of
unacceptable values of nTC and Ce (red region) includes their values located at the
boundaries of this region. This indicates that it is inadmissible to regulate the nTC parameter
to obtain the maximum permissible value of Ce. “Level 0.1” means the lower level of
permissible Ce values, “Level 0.5” – optimal Ce values, “Level 0.9” – maximum permissible Ce
values. The inadmissibility of adjusting the nTC parameter to obtain the maximum
permissible value of Ce in helicopter flight mode is explained by the fact that in this context
there is a certain connection between the gas-generator rotor r.p.m. nTC and the specific fuel
consumption Ce, which is determined by the optimal operating conditions of the engine.
When adjusting the gas-generator rotor r.p.m. nTC to achieve the maximum permissible
value of specific fuel consumption Ce located on the border of the red area in the figure, the
system may go beyond the permissible parameters of engine operation. This can lead to
undesirable consequences such as engine overheating, loss of flight stability, or even a
crash. To ensure the safety and normal operation of the helicopter at flight mode, it is
important to maintain optimal engine operating parameters, including those related to the
gas-generator rotor r.p.m., to avoid going beyond the permissible range of specific fuel
consumption values. Thus, Fig. 12 provides important information for regulating system
operation parameters, as it allows you to determine the optimal nTC values to achieve the
desired specific fuel consumption indicators.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Discussion</title>
      <p>The work carried out a comparative analysis of the solution to the task of helicopter TE
parameters debugging based on a feed-forward neural network with adaptive elements
using both the training algorithm proposed in the work and the Delta-Bar-Delta algorithm
used in [41]. The I and II type errors were calculated in obtaining the gas-generator rotor
r.p.m. nTC boundary values to achieve the required levels of specific fuel consumption Ce
(Table 8).</p>
      <p>A type I error occurs when the null hypothesis H0 is rejected when it is in fact true, and
is defined as:
=  (reject  0| 0 true).
(29)
=  (accept  0| 0 false).
(30)</p>
      <p>A type II error occurs when accepting the null hypothesis H0 when it is in fact false, and
is defined as:</p>
      <p>The null hypothesis H0 in the problem under consideration is that the use of a
feedforward neural network to determine the gas-generator rotor r.p.m. nTC boundary
values does not lead to statistically significant changes in achieving the required levels of
specific fuel consumption Ce.</p>
      <p>The significance level adopted in this work is 0.01, which means that when conducting a
statistical test with this level of significance, the probability of a type I error is 0.01. That is,
if the test results reject the null hypothesis at this significance level, then the probability of
making a type I error is 1 %, which is a low enough probability level to detect statistically
significant differences between groups or conditions.
types first and second errors by 1.65...1.71 times compared with the use of the
Delta-BarDelta algorithm for its training [41] at the significance level is 0.01.</p>
      <p>At the final stage of the comparative analysis, the efficiency coefficients and quality
coefficients of the feedforward neural network with adaptive elements were calculated for
both the proposed training algorithm and the Delta-Bar-Delta algorithm [41], according to
the expressions [47–50] (Table 9):</p>
      <p>∙ 100%,
 
= (1 −</p>
      <p>)∙ 100%,
 0
(32)
T0 = 5 s is assumed) [56–58].</p>
      <sec id="sec-6-1">
        <title>Gas-generator rotor r.p.m.</title>
        <p>nTC boundary values
where Kerror and Kquality represent the errorneus and quality coefficients [51–53] for
determining the gas-generator rotor r.p.m. nTC boundary values by a feedforward neural
network with adaptive elements; Terror indicates the total time of segments associated with
misclassification [54], while T0 denotes the duration of the test sample [55] (in this work,
Results of calculating the quality and efficiency coefficients (author's research)</p>
      </sec>
      <sec id="sec-6-2">
        <title>Feed-forward neural</title>
        <p>network with adaptive
elements using both the
training algorithm
proposed in the work</p>
        <sec id="sec-6-2-1">
          <title>Kerror</title>
        </sec>
      </sec>
      <sec id="sec-6-3">
        <title>Feed-forward neural</title>
        <p>network with adaptive
elements using
Delta-Bar</p>
      </sec>
      <sec id="sec-6-4">
        <title>Delta algorithm used in [41]</title>
        <sec id="sec-6-4-1">
          <title>Kquality</title>
          <p>From Table 9 it can be seen that the use of a forward propagation neural network with
adaptive elements, trained based on the algorithm proposed in the work, made it possible
to reduce the erroneous coefficient by 1.89 times and slightly (1.01 times) increase the
quality coefficient for determining the gas-generator rotor r.p.m. nTC boundary values
compared with the use Delta-Bar-Delta algorithm [41].</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusions</title>
      <p>For the first time, a training algorithm for forward propagation neural networks has
been developed, based on the backpropagation algorithm, which, through the use of
adaptive elements, such as adaptive training rate, adaptive initialization of neural network
weights, adaptive regularization, adaptive neuron activation function, adaptive change in
neural network architecture, adaptive change in the size of the mini-batch made it possible
to achieve almost 100% accuracy of their training on both the training and validation data
sets with a minimum number of iterations.</p>
      <p>2. The training rate coefficient optimal value, the number of neurons in the hidden
layer of the neural network, and the iterations optimal number when training a neural
network were experimentally substantiated by determining the smallest value of the final
total standard deviation per epoch. By conducting a computational experiment to solve the
task of helicopter turboshaft engine parameters debugging with 2 input neurons and 2
output neurons, as well as 256 elements of the training set, the optimal training rate
coefficient value was obtained – 0.0001, the optimal number of neurons in the hidden layer
of the neural network – 10, the optimal number of iterations – 100, since they correspond
to the minimum values of the final total standard deviation for the epoch, which,
respectively, amounted to 3.642, 4.317, 3.358.</p>
      <p>3. It has been experimentally proven that the use of L2-regularization in the developed
feed-forward neural network training algorithm with adaptive elements raises the
adjustment curve (or a similar researched dependence) by the regularization value,
bringing it closer to 1, by adding term to the original function, which increases its meanings.
This made it possible, in the task of helicopter turboshaft engine parameters debugging, to
adjust the gas-generator rotor r.p.m. value 2 times more accurately, compared with the use
of the well-known Delta-Bar-Delta neural network training algorithm.</p>
      <p>4. An updated area of data distribution of two classes (gas-generator rotor r.p.m. and
specific fuel consumption) was obtained with gas-generator rotor r.p.m. boundary values,
respectively, lines at levels 0.1, 0.5, 0.9, which reduced errors of the first and second kind
by 1.65...1.71 times compared with the use of the Delta-Bar-Delta neural networks training
algorithm.</p>
      <p>5. It has been mathematically proven that the use of the developed training algorithm
for forward propagation neural networks with adaptive elements reduces the erroneous
coefficient by 1.89 times and slightly (1.01 times) increases the quality coefficient for
determining the gas-generator rotor r.p.m. boundary values in the task of helicopter
turboshaft engines parameters debugging compared with the use of Delta-Bar-Delta neural
network training algorithm.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgements</title>
      <p>This research was supported by the Ministry of Internal Affairs of Ukraine “Theoretical and
applied aspects of the development of the aviation sphere” under Project No. 0123U104884.
Woodhead Publishing, Sawston, England, 2023, pp. 67–116. doi:
10.1016/B978-0-44315252-8.00006-6.
[4] X. Zhu, M. Li, X. Liu, Y. Zhang, A backpropagation neural network-based hybrid energy
recognition and management system, Energy 297 (2024) 131264.
doi: 10.1016/j.energy.2024.131264
[5] A. Sachenko, V. Kochan, V. Turchenko, V. Tymchyshyn, N. Vasylkiv, Intelligent nodes for
distributed sensor network, in: Proceedings of the 16th IEEE Instrumentation and
Measurement Technology Conference (IMTC/99), Venice, Italy, 1999, pp. 1479–1484.
doi: 10.1109/IMTC.1999.776072
[6] A. Sachenko, V. Kochan, V. Turchenko, Intelligent distributed sensor network, in:
IMTC/98 Conference Proceedings. IEEE Instrumentation and Measurement
Technology Conference. Where Instrumentation is Going, St. Paul, MN, USA, 1998,
pp. 60–66. doi: 10.1109/IMTC.1998.679663
[7] S. Babichev, M. Korobchynskyi, O. Lahodynskyi, O. Korchomnyi, V. Basanets, V.</p>
      <p>
        Borynskyi, Development of a technique for the reconstruction and validation of gene
network models based on gene expression, Eastern-European Journal of Enterprise
Technologies 1(4 (91)) (2018) 19–32. doi: 10.15587/1729-4061.2018.123634
[8] S. Babichev, V. Lytvynenko, J. Skvor, J. Fiser, Model of the objective clustering inductive
technology of gene expression profiles based on SOTA and DBSCAN clustering
algorithms, Advances in Intelligent Systems and Computing 689 (2018) 21–39.
doi: 10.1007/978-3-319-70581-1\_2
[9] O. Ivanov, L. Koretska, V. Lytvynenko, Intelligent modeling of unified communications
systems using artificial neural networks, CEUR Workshop Proceedings 2623 (2020)
77–84.
[10] S. Vladov, R. Yakovliev, O. Hubachov, J. Rud, Neuro-Fuzzy System for Detection Fuel
Consumption of Helicopters Turboshaft Engines, CEUR Workshop Proceedings 3628
(2024) 55–72.
[11] L. Wang, W. Ye, Y. Zhu, F. Yang, Y. Zhou, Optimal parameters selection of back
propagation algorithm in the feedforward neural network, Engineering Analysis with
Boundary Elements 151 (2023) 575–596. doi: 10.1016/j.enganabound.2023.03.033
[12] H. Calvo-Pardo, T. Mancini, J. Olmo, Granger causality detection in high-dimensional
systems using feedforward neural networks, International Journal of Forecasting 37:2
(2021) 920–940. doi: 10.1016/j.ijforecast.2020.10.004
[13] K. S. Narendra, K. Parthasarathy, Identification and Control of Dynamical Systems Using
Neural Networks, IEEE Transactions on Neural Networks 1:1 (1990) 4–27.
doi: 10.1109/72.80202
[14] J. M. Maroli, Generating discrete dynamical system equations from input–output data
using neural network identification models, Reliability Engineering &amp; System Safety
235 (2023) 109198. doi: 10.1016/j.ress.2023.109198
[15] R. G. Ramirez-Chavarria, M. Schoukens, Nonlinear Finite Impulse Response Estimation
using Regularized Neural Networks, IFAC-PapersOnLine 54:7 (2021) 174–179.
doi: 10.1016/j.ifacol.2021.08.354
[45] D. Parnes, A. Gormus, Prescreening bank failures with K-means clustering: Pros and cons,
International Review of Financial Analysis 93 (2024) 103222.
doi: 10.1016/j.irfa.2024.103222
[46] S. Vladov, Y. Shmelov, R. Yakovliev, Y. Stushchankyi, Y. Havryliuk, Neural Network
Method for Controlling the Helicopters Turboshaft Engines Free Turbine Speed at
Flight Modes, CEUR Workshop Proceedings 3426 (2023) 89–108.
[47] M. Duhan, P. K. Bhatia, Hybrid Maintainability Prediction using Soft Computing
Techniques, International Journal of Computing 20(
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) (2021) 350–356.
doi: 10.47839/ijc.20.3.2280
[48] M. Duhan, P. K. Bhatia, Software Reusability Estimation based on Dynamic Metrics
using Soft Computing Techniques, International Journal of Computing 21(
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) (2022)
188–194. doi: 10.47839/ijc.21.2.2587
[49] S. Vladov, Y. Shmelov, R. Yakovliev, M. Petchenko, S. Drozdova, Helicopters Turboshaft
Engines Parameters Identification at Flight Modes Using Neural Networks, in:
Proceedings of the IEEE 17th International Conference on Computer Science and
Information Technologies (CSIT), Lviv, Ukraine, 2022, pp. 5–8.
doi: 10.1109/CSIT56902.2022.10000444
[50] S. Vladov, Y. Shmelov, R. Yakovliev, M. Petchenko, S. Drozdova, Neural Network Method
for Helicopters Turboshaft Engines Working Process Parameters Identification at
Flight Modes, in: Proceedings of the 2022 IEEE 4th International Conference on Modern
Electrical and Energy System (MEES), Kremenchuk, Ukraine, 2022, pp. 604–609.
doi: 10.1109/MEES58014.2022.10005670
[51] V. V. Morozov, O. V. Kalnichenko, O. O. Mezentseva, The method of interaction modeling
on basis of deep learning the neural networks in complex it-projects, International
Journal of Computing 19(
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) (2020) 88–96. doi: 10.47839/ijc.19.1.1697
[52] S. Bezobrazov, V. Golovko, A. Sachenko, M. Komar, R. Dolny, V. Kasyanik, P. Bykovyy,
E. Mikhno, O. Osolinskyi, Deep multilayer neural network for predicting the winner of
football matches, International Journal of Computing 19(
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) (2020) 70–77.
doi: 10.47839/ijc.19.1.1695
[53] E. M. Cherrat, R. Alaoui, H. Bouzahir, Score fusion of finger vein and face for human
recognition based on convolutional neural network model, International Journal of
Computing 19(
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) (2020) 11–19. doi: 10.47839/ijc.19.1.1688
[54] K. Andriushchenko, V. Rudyk, O. Riabchenko, M. Kachynska, N. Marynenko, L. Shergina,
V. Kovtun, M. Tepliuk, A. Zhemba, O. Kuchai. Processes of managing information
infrastructure of a digital enterprise in the framework of the «Industry 4.0» concept,
EasternEuropean Journal of Enterprise Technologies 1(
        <xref ref-type="bibr" rid="ref3">3–97</xref>
        ) (2019) 60–72. doi:
10.15587/17294061.2019.157765
[55] T. E. Romanova, P. I. Stetsyuk, A. M. Chugay, S. B. Shekhovtsov. Parallel Computing
Technologies for Solving Optimization Problems of Geometric Design, Cybernetics and
System Analysis 55(6) (2019) 894–904. doi: 10.1007/s10559-019-00199-4
[56] S. Vladov, Y. Shmelov, R. Yakovliev, Modified Neural Network Method for Classifying
the Helicopters Turboshaft Engines Ratings at Flight Modes, in: Proceedings of the
2022 IEEE 41st International Conference on Electronics and Nanotechnology
(ELNANO), Kyiv, Ukraine, 2022, pp. 535–540. doi: 10.1109/ELNANO54667.2022.9927108
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Heidari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. H.</given-names>
            <surname>Moattar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Ghaffari</surname>
          </string-name>
          ,
          <article-title>Forward propagation dropout in deep neural networks using Jensen-Shannon and random forest feature importance ranking</article-title>
          ,
          <source>Neural Networks</source>
          <volume>165</volume>
          (
          <year>2023</year>
          )
          <fpage>238</fpage>
          -
          <lpage>247</lpage>
          . doi:
          <volume>10</volume>
          .1016/j.neunet.
          <year>2023</year>
          .
          <volume>05</volume>
          .044.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>El-Sharkawy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wael</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mashaly</surname>
          </string-name>
          , E. Azab,
          <article-title>Re-configurable parallel Feed-Forward Neural Network implementation using FPGA</article-title>
          ,
          <source>Integration</source>
          <volume>97</volume>
          (
          <year>2024</year>
          )
          <article-title>102176</article-title>
          . doi:
          <volume>10</volume>
          .1016/j.vlsi.
          <year>2024</year>
          .
          <volume>102176</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>W.-K.</given-names>
            <surname>Hong</surname>
          </string-name>
          , 4
          <article-title>- Forward and backpropagation for artificial neural networks</article-title>
          , in: W.
          <article-title>-</article-title>
          K. Hong (Ed.),
          <source>Artificial Intelligence-Based Design of Reinforced</source>
          Concrete Structures,
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>