Hybrid Neural Network Identifying Complex Dynamic
                         Objects: Comprehensive Modelling and Training Method
                         Modification
                         Victoria Vysotska1, Serhii Vladov2, Ruslan Yakovliev2 and Alexey Yurko3
                         1 Lviv Polytechnic National University, Stepan Bandera Street 12, Lviv, 79013, Ukraine
                         2 Kremenchuk Flight College of Kharkiv National University of Internal Affairs, Peremohy Street 17/6, Kremenchuk,

                         39605, Ukraine
                         3 Kremenchuk Mykhailo Ostrohradskyi National University, University Street 20, Kremenchuk, 39600, Ukraine


                                         Abstract
                                         The article to the development of mathematical models of complex dynamic objects (using the example
                                         of helicopter turboshaft engines) in the form of recurrent neural networks and their use in complex
                                         modelling to identify the parameters of automatic control, monitoring, and diagnostic systems is
                                         described. For the first time, a concept has been created for constructing neural network models of
                                         complex dynamic objects, in which, by increasing the robustness of the functioning of a trained neural
                                         network, it becomes possible at its output to increase the reliability of solving problems of identifying
                                         complex dynamic objects. The use of a hybrid neural network NARX with a radial-basis nonlinear layer is
                                         proposed. The introduction of a radial basis nonlinear layer into the NARX neural network is an effective
                                         addition when working with unstructured data such as images, audio signals, or text due to its ability to
                                         extract and represent complex patterns and features in these data. This is confirmed by the results of
                                         modelling the losses of the neural network, which turned out to be stable over 500 epochs of its training
                                         and did not exceed 0.025 (2.5 %). A comprehensive modification of the Levenberg-Marquardt training
                                         method is proposed, which consists of a particular application of the Broyden method for calculating the
                                         elements of the Hessian matrix, as well as an analytical description of the regularization parameter
                                         through the use of control coefficients for increasing or decreasing its value in the event of a neural
                                         network training error. The use of the modified Levenberg-Marquardt method made it possible to reduce
                                         the average training error of the NARX hybrid neural network with a radial basis layer by 33 % to the
                                         level of 0.025.

                                         Keywords
                                         Neural network, helicopters turboshaft engines, complex dynamic objects, hybrid neural network NARX,
                                         radial-basis nonlinear layer, Levenberg-Marquardt method, training, Broyden method, loss 1


                         1. Introduction
                         Experimental research and modelling of complex dynamic objects, for example, helicopter
                         turboshaft engines (TE) and their control systems, are constant elements of knowledge of their
                         behaviour throughout the entire life cycle. This starts from the design stage, fine-tuning, and
                         certification, and ending with operation and disposal. Such studies require a special integrated
                         modelling technology creation. It makes it possible to confirm the reliability, operability and
                         required characteristics of systems both before putting them into operation and in operating
                         modes [1, 2]. Today, the development of the industry is based on technologies of digital
                         manufacturing, computer modelling, machine learning, cloud computing, and cyber-physical
                         systems. The digital twins concept is being fully implemented. This is a virtual representation of
                         a physical object not only at the stages of design, development, and commissioning but also
                         throughout the entire life cycle, including operation and disposal [3, 4].


                         CMIS-2024: Seventh International Workshop on Computer Modeling and Intelligent Systems, May 3, 2024,
                         Zaporizhzhia, Ukraine
                             victoria.a.vysotska@lpnu.ua (V. Vysotska); ser26101968@gmail.com (S. Vladov); director.klk.hnuvs@gmail.com
                         (R. Yakovliev); yurkoalexe@gmail.com (A. Yurko)
                           0000-0001-6417-3689 (V. Vysotska); 0000-0001-8009-5254 (S. Vladov); 0000-0002-3788-2583 (R. Yakovliev);
                         0000-0002-8244-2376 (A. Yurko)
                                    © 2024 Copyright for this paper by its authors.
                                    Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).

CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
    The technology of semi-natural modelling of complex control objects has been used for a long time
in many industries, where real systems are coupled with mathematical models of control objects.
However, the methods and tools for creating such models often remain the same [5, 6].
    One of the most pressing and important problems is to ensure the adequacy of the model of a
complex dynamic object in the automatic control, monitoring, and diagnostics system. The
operation of control, monitoring, and analysis algorithms at the same time can cause various
collisions that need to be taken into account and modelled when developing and configuring a
control system [7]. In addition, in the process, complex dynamic objects gradually exhaust their
resources, and their characteristics begin to degrade [8, 9]. In the process of analysis and synthesis
of automatic control systems, the need arises to correct and adapt the existing model of a complex
dynamic object for its effective operation. To solve this problem, adaptive models are needed that
are identified by the real characteristics of the object and its operating conditions.
    The widespread application of intelligent technologies utilizing neural networks has been
observed in the research and development of sophisticated control and monitoring systems
tailored for complex dynamic objects [10, 11], including helicopters TE [12, 13]. However, the
task remains of the adequacy and applicability of mathematical models of complex dynamic
objects in operating modes, which are mostly presented in the form of fast-calculating piecewise
linear dynamic models [14, 15].
    The research aim is to increase complex modelling and testing efficiency of a real automatic
control system (ACS) for complex dynamic objects, monitoring, and diagnostics through the use
of nonlinear dynamic mathematical models and their systems in the form of neural networks
(using the example of helicopters TE). A scientific concept is proposed for the neural network
model constructing for a complex dynamic object (using the example of helicopters TE), including
algorithms for training and identifying a mathematical model of the engine using real data with a
choice of the structure and size of the neural network.

2. Related works
Currently, neural networks are an effective means of solving a wide range of problems in complex
dynamic objects identifying and their control systems [16, 17]. They are distinguished by the
simplicity of the architecture and high representative power. The quality of operation of these
networks largely depends on the efficiency of data clustering, as a result of which the centres of
activation functions and their dispersion are determined. The works of individual domestic and
foreign scientific schools are devoted to the issue of automating the neural network architecture
selection. Also, this work focuses both on local modifications of training algorithms [18, 19] and
on the use of bionic models [20, 21] to optimize the number of neurons in the hidden layer [22,
23]. The latter has good potential. Because it leads to growing interest in the use of distributed
intelligent systems to optimize neural network architecture [24, 25]. An alternative to the
described solutions is methods based on special approaches to density clustering. As a result, in
hidden layer the optimal neurons number is determined and their key characteristics are
established [26, 27]. A common disadvantage of most known solutions is the requirement for the
completeness of processed samples. This makes the use of such methods for working in systems
with dynamically changing data, for example in control systems, ineffective compared to such
specialized neuroarchitectures as Jordan networks [28, 29], Elman networks [30, 31] or
recurrent multilayer perceptron [32, 33].

3. Methods and materials
To solve the above task of complex dynamic objects and their control systems, an intelligent
system (Fig. 1) can be used that implements the Fault Detection and Identification Method (FDI)
[34]. It is based on a neural network mathematical model of the research object and an
identification block [35, 36]. Such a system makes it possible to detect and classify abnormal
operating modes of the research object, measuring channels and actuators under operating
conditions. The output parameters of the mathematical model can be used to diagnose abnormal
operating conditions of the research object based on a comparison of the matching the computed
parameters with the observed ones. Additionally, they used and also as to restore lost data of
measuring channels in the event of their failure being detected. Such a model should have several
special properties, the most important of which are [37, 38]:
   1. The model must describe the properties of the research object that determine the non-
stationary nature of work processes. This means the need to use a dynamic model.
   2. The structure of the mathematical model of the research object should provide the practical
possibility of its functioning in combination with mathematical models of other elements of the system.
                                                                   Ym
                                Neural network model
           U                                                            ε   Identification
                                                                                block
                                      Complex
                    Actuators                           Sensors    Y
                                   dynamic objects
Figure 1: Structure of an intelligent system for identifying complex dynamic objects and their
control systems based on the FDI method (author's research, based on [24–36])

   The mathematical representation of the nonlinear dynamic model of complex dynamic objects,
taking [39] into account, can be represented as a differential equations system:
          ∆𝜕𝑦1 (𝑡)
                    = 𝑎11 ∙ 𝑦1 (𝑡) + ⋯ + 𝑎1𝑛 ∙ 𝑦𝑛−1 (𝑡) + 𝑏11 ∙ 𝑥1 (𝑡) + ⋯ + 𝑏1𝑚 ∙ 𝑥𝑚 (𝑡),
             𝜕𝑡
         ∆𝜕𝑦2 (𝑡)
                    = 𝑎21 ∙ 𝑦1 (𝑡) + ⋯ + 𝑎2𝑛 ∙ 𝑦𝑛−1 (𝑡) + 𝑏21 ∙ 𝑥1 (𝑡) + ⋯ + 𝑏2𝑚 ∙ 𝑥𝑚 (𝑡),   (1)
             𝜕𝑡
                                                 ⋯
         ∆𝜕𝑦𝑛 (𝑡)
        { 𝜕𝑡       = 𝑎𝑛1 ∙ 𝑦1 (𝑡) + ⋯ + 𝑎𝑛𝑛 ∙ 𝑦𝑛−1 (𝑡) + 𝑏𝑛1 ∙ 𝑥1 (𝑡) + ⋯ + 𝑏𝑛𝑚 ∙ 𝑥𝑚 (𝑡).
   Currently, neural network models built on linear neural networks, for example, multilayer
perceptron, have been acquired for the identification of complex dynamic objects [40, 41]. Neural
network models based on linear neural networks (as multi-layer perceptron) may face several
disadvantages when modelling complex dynamic objects. First, they are limited in their ability to
capture complex nonlinear relations between variables. This can lead to insufficient accuracy in
predicting or modelling. In addition, multi-layer perceptron can suffer from the problem of
gradient damping when training deep models. This makes training difficult and can lead to low
performance in practice. Also, they can be prone to overtraining in the presence of a limited
amount of training data. This makes them less effective for processing real dynamic systems.
Therefore, it is expedient to use dynamic recurrent neural networks, for example, recurrent
multilayer perceptron (NARX). The justification for the transition from a multilayer perceptron
to a recurrent multilayer perceptron (NARX) is based on several factors and is given in Table 1,
and the scientific features of this transition are in Table 2.

Table 1
Factors for the transition from a multilayer perceptron to a hybrid NARX network for the
identification of complex dynamic objects (author's research)
      Factor                                            Description
  Complexity Complex dynamical systems have non-linear behaviour and relations between
        of        different parameters. Multilayer perceptron may not be flexible enough to model
  the system such a complex system.
  Taking into The NARX hybrid network provides an opportunity to take into account the dynamic
     account      properties of the system based on time dependencies between inputs and outputs.
        the       This is especially important for predicting the parameters of the work process of the
    dynamics      research object, which may change over time.
                  Complex dynamic objects are exposed to various external influences and changes in
    Handling
                  operating conditions. Hybrid NARX networks can better adapt to data uncertainty
  uncertainty
                  and provide more accurate predictions under different conditions.
      Use of      NARX hybrid networks allow you to include both current parameter values and
   contextual     historical data in the model. This can be useful for analyzing previous system states
       data       and identifying patterns of parameter changes
Table 2
Scientific substantiation of the peculiarity of the transition from a multilayer perceptron to a hybrid
NARX network (author's research)
             Factor                                            Description
  Provision no. 1. Innovative modelling of complex dependencies through a combination of nonlinear
                             autoregressive models and radial basis functions
                                  Nonlinear autoregressive NARX models are capable of capturing
     Combining nonlinear          complex dynamic relationships between input and output variables.
  autoregressive models and Nonlinear functions bring flexibility to the model, which allows you to
     radial basis functions       adapt to different forms of data, which is especially important when
                                  modelling input parameters with a nonlinear nature.
                                  The workflow parameters of complex dynamic objects can exhibit
     Effective modeling of        complex, non-linear dependencies that are best described by the
    complex dependencies          hybrid NARX network. This makes it possible to more accurately
                                  predict changes in parameters under different operating conditions.
    Provision no. 2. Feasibility of solving the problem of identification of complex dynamic objects
  using the NARX hybrid network and comparing its results with the use of a multilayer perceptron
                                  Comparison of accuracy coefficients, such as root mean square
      Modelling accuracy
                                  deviation, on training and test samples.
                                  It is estimated how well the model generalizes its knowledge to new
  Generalization of new data
                                  data not used during training.
                                  Analysis of the stability of models in various conditions, including
            Stability
                                  changes in external parameters.
   Provision no. 3. Simultaneous use of simulation results using multilayer perceptron and hybrid
                                 NARX network in operational conditions
       Using a multilayer         They are used to quickly assess current input parameters and provide
           perceptron             a response to rapidly changing conditions.
    Using the hybrid NARX         They are used for deeper analysis of dynamic changes in parameters,
            network               taking into account time dependencies and predicting future values.
                                                Conclusions
 The combined use of a multilayer perceptron and a hybrid NARX network allows combining the
 advantages of both models, providing more accurate and flexible real-time identification for
 complex dynamic objects.

    In [40], the use is justified by a modified version of a recurrent multilayer perceptron (NARX).
It is a dynamic network characterized by the delay of output/input signals combined in a network
input vector, with a radial basis nonlinear and a linear recurrent layer. It should be noted that
[40] uses a Gaussian NARX framework with input data regressor selection using a modified
gradient method from [41]. This modification is justified based on the outdated NARX models
with outdated machine training models [42]. The modified NARX structure proposed in [40]
consists of two parts: nonlinear and linear blocks (Fig. 2, where σi is i-th element radial function
width; 𝑐𝑖1 , 𝑐𝑖2 , . . ., 𝑐𝑖𝑛 are i-th element coordinates centre; 𝑢1 , 𝑢2 , . . ., 𝑢𝑛 are the input signals).
    Such a model in a neural network form with feedback makes it possible to take into account
the nonlinear dynamic characteristics of an object and guarantee the structural and parametric
adequacy of its analytical model. The vector u placed to the input has the form u(t) = [1, u(t), u(t
– 1), … u(t – Nu), y(t – 1), ..., u(t – Ny)]T, where Nu – is the input signal delays number, Ny – is the
output signal delays number [40]. Depending on the complex dynamic object model, the vector u
is formed according to the parameters specified in the technical specifications.
    According to [33, 34, 37], the network output vector has the following mapping [40]:
        𝑦(𝑡 + 1) = 𝑓 (𝑢(𝑡), 𝑦(𝑡 − 1), … , 𝑦(𝑡 − 𝑁𝑦 ), 𝑢(𝑡 − 1) … , 𝑢(𝑡 − 𝑁𝑢 )),                            (2)
then the NARX hybrid network is characterized by a set of numbers (Nu, Ny, Ni), where Ni is
neurons in the i-th hidden layer number.
                                              u − c1
                                                                                   n

                                                                                   ( u j − c ij )
                                                                                                     2

                                                                                   j =1
                                                                              −
                                                                                          2  i2
                                                                          e
                                       c11 c12 ...c1n
                                              u − c2
                                                                                    n

                                                                                   ( u j − c ij )
                                                                                                     2

   Output              y(t)                                                        j =1
                                                                              −
                                                                                          2  i2
                     u(t –                                                e
   Inputs u:                           c21 c 2 2               c2n
  h, T N, PN, ρ,
                     u(t –                               ...


                                              u − c3
                                                                                   n
                        ...                                                        ( u j − c ij )
                                                                                                     2


                                                                                                                        Σ
  nTC, nFT, TG                                                                    j =1
                                                                                                                                y
                                                                              −
                    u(t – Nu)                                             e               2  i2

                                       c 31     c 3 2.         c
                                                 . 3n    ...
                                                                                          ...
                                              x − cm
                                                                                    n

                                                                                   ( u j − c ij )
                                                                                                     2

                                                                                   j =1
                                                                              −
                                                                                          2  i2
                                                                          e
                                       c m 1 cm 2...cmn
                                                                   Nonlinear block
                      y(t –
                      y(t –
                        ...                                                   f(u) = u
                     y(t – Ny )               Linear block

Figure 2: Gaussian NARX architecture using modified neural network (author's research [40])

   To the aforementioned, the comprehensive schematic representation of configuring the neural
network model parameters for complex dynamic objects (using the example of helicopters TE
[40]) is presented in Fig. 3, where ; Δwij is the neural network synaptic connections increase; Y =
(y1, y2 ..., ym)T is the object output parameters vector; U = (u1, u2 ..., um)T is the input influences
vector; 𝐘𝑁𝑁 = (𝑦1𝑁𝑁 , 𝑦2𝑁𝑁 , … , 𝑦𝑛𝑁𝑁 )𝑇 is the neural network outputs vector [40].
                                                     y1
           U1                Complex dynamic
                                   object            yn
           Um
                                                                                                         ε1

                                                                   y1NN                                       
                                                                                                              i
                                                                                                                   i
                                                                                                                    2
                                                                                                                            E
                                                                                                         εn
                              Neural network                       ynNN


                               Δwij                                  Training algorithm

                                                               Modified neural network
                                                                 training algorithm
Figure 3: Complex dynamic objects neural network model sheme (author's research [40])

   Modifications of training of neural networks are conducted to improve performance or adjust
the model for the specific requirements of the task (Fig. 3). Modifying a neural network training
includes changing its structure based on adding or removing layers and neurons, changing
activation functions, and adjusting hyperparameters. Also, this includes training rate or
regularization parameters, and introducing additional training methods, such as data
augmentation or based on pre-trained models for transferring training. Modification may also be
necessary if the input data characteristics change, the performance requirements change, or the
problem that the neural network needs to solve changes. Control influences vector conversion
into initial parameters vector is elucidated by operator F [40]:
                                         𝐘 = 𝐅(𝐔).                                           (3)
   The identifying helicopter ЕУ task using a neural network can be formulated: using the results
of the proposed training process for a neural network. It forms vectors (Ui; Yi) training set
obtained experimentally for a separate engine instance. Aim is to find operator FNN within neural
network architectures class. The F operator approximation by FNN operator is deemed optimal if
a specified functional from the difference (Y – YNN) does not surpass a given small value εadd,
defining the F operator approximation accuracy [40]:
                                                     𝑛

                               𝐸 = ‖𝐘 − 𝐘  𝑁𝑁 ‖
                                                  = ∑ 𝜀𝑖2 ≤ 𝜀𝑎𝑑𝑑 .                               (4)
                                                     𝑖
   The condition (4) satisfaction is guaranteed by neural network training. It involves finetuning
parameters using the training sample {(U, Y)} and is verified on a meticulously organized test
sample [40]. A scientific concept of direct neural network model construction based on complex
dynamic objects is proposed, which is shown in Table 3.

Table 3
The scientific concept of the step-by-step creation of a neural network model of complex dynamic
objects (author's research)
 Step                                            Description
        Development of unique criteria and metrics for evaluating the effectiveness of identification
   1    of complex dynamic objects. Justification of the need to define goals to ensure the accuracy
        and objectivity of the assessment.
        The rationale for selecting a particular neural network architecture and identifying its
   2
        integration point within the complex dynamic object identification system.
        Analysis and justification of the network training algorithm choice considering the specifics
   3
        of the task to achieve optimal adaptive training.
        Description of conducting experiments on a digital model using additional resulting results
   4    to create a training sample and taking into account new criteria and metrics to improve
        model accuracy.
   5    Network training process description using formed training sample and training algorithm.
        Justification for the neural network simplifying and reducing to achieve optimal information
   6
        storage and efficient operation with a minimum number of parameters.
        Justification of measures aimed at increasing the robustness of the functioning of the trained
   7
        neural network model, taking into account possible challenges and unexpected situations.
        Modelling and testing algorithms description for monitoring the operational status and
   8    operation for complex dynamic objects management, including ACS based on a neural
        network.
        Justification of the choice of software or hardware development of a neural network for
   9
        implementation in a real system of identification of complex dynamic objects.

   The proposed scientific concept of the step-by-step creation of a neural network model of
complex dynamic objects defines clear stages of intelligent systems development and
implementation. The results of the work of each stage (the development of evaluation criteria,
the selection of a network structure, the analysis of training algorithms, and experimental
research on a digital model) form a reliable basis for the creation of an effective intelligent
monitoring system. In particular, the justified reduction of the neural network and measures to
increase the robustness of the model indicate a desire for optimal efficiency. The final stages
(modelling and testing algorithms, as well as the choice of software or hardware implementation)
emphasize the practical suitability of the concept for implementation in a real intelligent system
for the identification of complex dynamic objects.
    For the practical implementation of the proposed scientific concept of the step-by-step
creation of a neural network model of complex dynamic objects (Table 3), attention should be
drawn to the indicator of model robustness (generalization ability). This is the stability of
modelling results to input data disturbances. Evaluation of the resilience or generalization
capability of a neural network model is conducted using an algorithm grounded on incremental
multi-criteria training [44].
    Complex dynamic object model training using a neural network (Fig. 2) is performed by
sequentially presenting pre-prepared delay vectors and corresponding output values while
simultaneously adjusting the weights of hidden layers by a certain procedure [45, 46]. The neural
network training process includes the application of the Levenberg-Marquardt method. The
method combines the steepest descent methods (i.e., minimizing the training error along a
gradient) and Newton's method (i.e., using a quadratic model to accelerate the quest for the
minimum of the error function [47]. The Levenberg-Marquardt method is intended for optimizing
nonlinear regression models of the form 𝐹(𝐮) = ∑𝑁    𝑖=1(𝑓𝑖 (𝐮) − 𝑦𝑖 ) . As an optimization criterion,
                                                                      2

it uses the model mean square error (MSE) on the set training, which it minimizes [40].
    The Levenberg-Marquardt method combines the ideas of the Gauss-Newton method and
gradient descent. At each iteration, this method updates the parameters u in this way:
                                  𝐮𝑘+1 = 𝐮𝑘 − (𝐽𝑇 𝐽 + 𝜆𝑘 𝐼)−1 𝐽𝑇 ∆𝐲,                              (5)
                                          𝜕𝑓𝑖
where J – is the Jacobian matrix of 𝐽𝑖𝑘 = 𝜕𝑢 , describing partial derivatives of a model concerning
                                            𝑘
parameters; Δy – is the vector representing the disparity between the present model values and
the real data; I – is the identity matrix; λk – is the regularization parameter that controls the step
size at each iteration.
   According to the research of Serhii Parkhomenko [47, 48], most often, to search for the
Jacobian matrix, various variations of difference formulas for calculating derivatives are used,
including the central difference derivative [49]. According to [47, 48] for the hybrid NARX
network yi values can be written as:
                          𝑁𝑖          𝑁𝑢                      𝑁𝑙

          𝑦(𝑢, 𝑤) = 𝑓1 (∑ 𝑤
                          ̃ 𝑖 ∙ 𝑓2 (∑ 𝑤𝑖𝑗 ∙ 𝑢𝑗 + 𝜎𝑖 ) ∙ 𝑓3 (∑ 𝑤𝑖𝑙 ∙ 𝑢𝑙 + 𝜎𝑖 ) + 𝜎̃),              (6)
                         𝑖=1         𝑗=1                     𝑙=1
where y(u, w) shows the dependence of an output value yi on input parameters vector values u
and corresponding weighting coefficients w, uj – is the value received by the j-th input neuron, wij
– is the weight coefficient connecting the j-th input neuron with i-th nonlinear hidden layer
neuron, σi – is the bias coefficient for i-th nonlinear hidden layer neuron, 𝑤    ̃ 𝑖 – is the weight
coefficient connecting i-th neuron of the nonlinear hidden nonlinear layer with the output
neuron, 𝜎̃ – is the bias coefficient for the output neuron, f1(•), f2(•), f3(•) – are the activation
functions for the output neuron and neurons of hidden nonlinear and linear layers, respectively,
Nu – is the number of the input, Ni and Nl – are the hidden nonlinear and linear layers neurons
number, respectively. The radial basis function is chosen as the activation function for neurons of
the output and nonlinear hidden layers according to [40] and for neurons of the linear hidden
layer – a linear function.
    According to [47, 48] when calculating the Hessian matrix, it is convenient to use the formula
H = JTJ, which is derived from the premises:
                                                                                       𝜕2 𝑦(𝑢,𝑤)
   •    the function y(u, w) has a low order of nonlinearity: the second derivatives             do not
                                                                                        𝜕𝑤𝑖 𝜕𝑤𝑗
   take very large values;
   •    matrix H is considered in a small neighbourhood of the minimizing vector w, for which
   the y(u, w) values are close to the desired fi(u), i.e. |𝑓𝑖 (𝐮) − 𝑦(𝑢, 𝑤)| ≈ 0.
   With an efficient operation of scalar matrix multiplication, the search for H is fast, while the
time for calculating the vector of weight changes depends on the variable number wij.
Experiments show [47, 48] that it is rarely necessary to solve more than three systems per
iteration, which does not greatly affect the execution time of the algorithm. Calculating the
Jacobian matrix takes up the bulk of the work of the Levenberg-Marquardt algorithm. So reducing
the cost of searching for it speeds up the neural network training. One of these methods is to
abandon the calculation of the completely accurate matrix J in favour of its approximate version.
For example, the Broyden method calculates Jn+1 using the matrix Jn calculated at step n according
to the formula [47, 48, 50]:
                                𝑦(𝑢, 𝑤𝑛+1 ) − 𝑦(𝑢, 𝑤𝑛 ) − 𝐽𝑛 ∙ ℎ
               𝐽𝑛+1 = 𝐽𝑛 + ℎ𝑇 ∙                                  , ℎ = 𝑤𝑛+1 − 𝑤𝑛 ,              (7)
                                             ℎ𝑇 ∙ ℎ
    From a theoretical point of view, using this approach at each step of the Levenberg-Marquardt
algorithm makes sense. However, in practice, the approximation becomes coarser over time. This
affects the JTE gradient vector and requires re-calculation of the Jacobian matrix using more
accurate methods after an unsuccessful selection of a vector of weight changes δ. Analytical
calculation of partial derivatives improves the accuracy of calculations. This allows you to shorten
the process by reusing intermediate data and reducing the number of calls to complex functions.
Similar to [47, 48], the following substitution was made in this work:
                    𝑁𝑢                              𝑁𝑖

                                           ̃ 𝑖 ∙ 𝑓2 (𝑠𝑖 ) ∙ 𝑓3 (𝑠𝑖 ) + 𝜎̃ ; 1 ≤ 𝑗 ≤ 𝑁𝑢 .
              𝑠𝑖 = ∑ 𝑤𝑖𝑗 ∙ 𝑢𝑗 + 𝜎𝑖 , 𝑆 = ∑ 𝑤                                                              (8)
                   𝑗=1                           𝑖=1
   Thus, we get:
      𝜕𝑦(𝑢, 𝑤)                        𝜕𝑦(𝑢, 𝑤)                  𝜕𝑦(𝑢, 𝑤) [𝐸 ∙ 𝐶 ′ ] ∙ [𝐸 ∙ 𝑠𝑖 ]
                = 𝑢𝑗 ∙ [𝐸 ∙ 𝐶 ∙ 𝑠𝑖 ],          = [𝐸 ∙ 𝐶 ∙ 𝑠𝑖 ],          =                       ,
        𝜕𝑤𝑖𝑗                            𝜕𝜎𝑖                       𝜕𝑤
                                                                   ̃𝑖          1 + [𝐸 ∙ 𝑠𝑖 ]
                                                                                                          (9)
    𝜕𝑦(𝑢, 𝑤)                  𝜕2𝐶            𝜕2𝐶                   𝜕𝐶 𝑤  ̃ 𝑖 ∙ [𝐸 ∙ 𝐶′2 ] ∙ [𝐸 ∙ 𝑠𝑖 ]
             = [𝐸 ∙ 𝐶′], 𝐶 = 2 ∙ [𝐸 ∙ 𝑆], 2 = 𝛼 ∙ [𝐸 ∙ 𝐶 ′ ],          =                             .
      𝜕𝜎̃                     𝜕𝑆             𝜕𝑆                    𝜕𝑠𝑖       (1 + [𝐸 ∙ 𝑠𝑖 ])2
    The analytical method of calculating the Jacobian matrix is not applicable in all cases, since for
each new neural network model it is necessary to revise the formulas. However, it requires less
computation than using central difference derivatives while maintaining accuracy. Distributing
calculations of the rows of matrix J between threads allows it to be filled in parallel since the
elements are calculated independently. This corresponds to the ribbon pattern. According to [48]
τ – is the maximum number of threads running simultaneously. For τ ≪ N, each thread on average
                  𝑁
processes about rows of matrix J. Row J represents the minimum unit of processing within a
                  𝜏
parallel block, Ξ – set of numbers of rows J processed by thread t. Since H* = JTJ, then, according
to [48], to save memory, you don’t have to store the JT and J matrices separately. For any 𝑛 ∈ [1, 𝑁]
and 𝑚 ∈ [1, 𝑀], the equality 𝐽𝑛𝑚 = 𝐽𝑛𝑚 𝑇
                                           is true, which allows you to get elements JT and J by
swapping the indices. To calculate the Hessian matrix element according to [48]
                              𝑁𝑢               𝑁𝑢
                    ∗
                   𝐻𝑝𝑞       𝑇
                       = ∑ 𝐽𝑞𝑖 ∙ 𝐽𝑖𝑝 = ∑ 𝐽𝑖𝑞 ∙ 𝐽𝑖𝑝 ; 1 ≤ 𝑝 ≤ 𝑀; 1 ≤ 𝑞 ≤ 𝑀.                               (10)
                         𝑖=1           𝑖=1
you need to know all the elements of the p-th and q-th columns of J. The simplest way is to wait
for an entire matrix J to be calculated, but this causes a synchronization point where all processes
must wait for the calculation to complete before continuing. Decomposing the matrix H* into a
sum allows us to avoid this [48]:
                          𝜏
                                   [𝑡]   [1]         [2]                   [𝜏]           [𝑡]
                    ∗
                   𝐻 = ∑ ( 𝐻) =             𝐻+           𝐻 + ⋯+                  𝐻 = { 𝐻𝑝𝑞 }         ,   (11)
                                                                                               𝑀×𝑀
                         𝑡=1
        [𝑡]                                                                        [𝑡]
where 𝐻 – is the matrix of the cumulative sum of flow t, 𝐻𝑝𝑞 – is the element of this matrix.
   For each calculated row of matrix J in [48], all its elements are multiplied by each other, which
leads to obtaining a matrix term:
                 [𝑡] +       𝑇
                    𝐻𝑘 | = 𝐽𝑖𝑘 ∙ 𝐽𝑘𝑗 = 𝐽𝑘𝑖 ∙ 𝐽𝑘𝑗 ; 𝑘 ∈ 𝛯, 1 ≤ 𝑖 ≤ 𝑀; 1 ≤ 𝑗 ≤ 𝑀,                (12)
                         𝑖𝑗
        [𝑡]                                                [𝑡]
where     𝐻𝑘+ | – is the element of the matrix               𝐻𝑘+.
              𝑖𝑗
   By combining the matrices of one stream, a cumulative sum matrix is formed in the form:
                                           [𝑡]                   [𝑡]
                                               𝐻= ∑                𝐻𝑘+ ,                                 (13)
                                                         𝑘∈Ξ𝑖
which, when added with matrices from other streams outside the parallel region, results in a
finite Hessian matrix H*.
    Thus, combining the calculations of row J and the matrix H within one parallel block was
achieved by distributing the calculations of a scalar matrix product over [t]H. Calculation 𝑔 =
𝐽𝑇 𝜀̃(𝑤) also requires all elements of the p-th column of matrix J [48]:
                                 𝑁𝑢            𝑁𝑢
                                    𝑇
                           𝑔𝑝 = ∑ 𝐽𝑝𝑖 𝜀̃ (𝑤) = ∑ 𝐽𝑖𝑝 𝜀̃ (𝑤), 1 ≤ 𝑝 ≤ 𝑀.                        (14)
                                𝑖=1            𝑖=1
   By decomposing the vector g into cumulative vectors [t]g for each thread, and then
                                       [𝑡]
decomposing them into term vectors 𝑔𝑘+ for all rows J processed within a single thread, we
obtain a similar calculation method used to calculate H*:
                                                             𝜏
        [𝑡] +          𝑇                              [𝑡] +     [𝑡]
           𝑔𝑘 |     = 𝐽𝑖𝑘 𝜀̃(𝑤) = 𝐽𝑘𝑖 𝜀̃(𝑤), 𝑔 = ∑ 𝑔𝑘 , 𝑔 = ∑ 𝑔, 𝑘 ∈ 𝛯𝑖 , 1 ≤ 𝑖 ≤ 𝑀.           (15)
                  𝑖
                                                 𝑘∈Ξ𝑖       𝑡=1
    After processing of the string J and all related calculations (10–15) is completed, it loses its
relevance, and, therefore, storing it in RAM, as well as the entire matrix J as a whole, becomes
redundant. It is enough to carry out a line-by-line search in several threads to obtain an array of
partial derivatives for weighting coefficients, using pairs of input and expected values, which
provides significant memory savings.
    In the dynamic approach, to select the regularization parameter λk at each iteration of the
Levenberg-Marquardt method, it is proposed to use an algorithm that adaptively adjusts its value
depending on the change in error at the current and previous iterations. Let ΔEk be the change in
error between the current and previous iterations, that is, ΔEk = Ek – Ek–1, where Ek–1 is error
function value at previous iteration k – 1and Ek is the error function value at current iteration k,.
Then we can determine the new value of Λk as follows:
                                            𝛼 ∙ 𝜆𝑘 , if ∆𝐸𝑘 > 0,
                                     𝛬𝑘 = { ∙ 𝜆𝑘 , if ∆𝐸𝑘 < 0,
                                            𝛽                                                   (16)
                                            𝜆𝑘 ,     if ∆𝐸𝑘 = 0,
where α and β – are the coefficients that control the increase or decrease in the value of Λk in the
event of a change in error. For example, α > 1, 0 < β < 1. This approach allows the regularization
parameter Λk to be adaptively changed depending on the direction of error change at each
iteration. This can help speed up the convergence of the method and improve its efficiency.
    An efficient choice of the initial value of the regularization parameter λk in the Levenberg-
Marquardt method can be essential to ensure fast convergence and avoid potential problems such
as overfitting or underconvergence. Mathematically, it can be done this way:
    1. The analytical method is based on c analytically estimating the initial value of the
    regularization parameter. Also, it takes into account the characteristics of the error function
    and gradient. For example, if the error function has large values, the initial value of λk can be
    chosen relatively large to ensure the stability of the algorithm. If the error function has small
    values, the initial value of λk can be chosen relatively small to allow the algorithm to converge
    quickly.
    2. Heuristic method – heuristics or rules of thumb can be used to select the initial value λk.
    For example, the initial value of λk may be chosen to be a small fixed value based on an
    assumption of typical parameter scales or error function values.
    3. Optimization method – optimization methods can also be used to select the optimal initial
    value of λk that minimizes the error function at the initial stage. For example, one can use grid
    search or optimization methods such as gradient descent to select the initial value of λk in such
    a way as to minimize the error function.
    Choosing an effective initial value for the regularization parameter λk can significantly impact
the Levenberg-Marquardt method performance. So, it is important to pay attention to this and
apply appropriate mathematical techniques to ensure the optimal choice.
    Since an error function sometimes contains several local minima or has different scales of
change, it is advisable to use the multiscale optimization method to adaptively adjust the step in
different parameter directions. At each iteration of the optimization algorithm, a base step size η
is selected and used to update the parameters based on an error function gradient. For each
parameter ui, its characteristic scale of change Δui is calculated, for example, as the standard
deviation or scale of a change in the parameter at previous iterations. The step size η is adaptively
adjusted for each parameter ui by its characteristic scale of change, allowing us to take into
account scale differences between parameters and adapt the step size for each parameter:
                                                  𝜂
                                            𝜂𝑖 =     .                                     (17)
                                                 ∆𝑢𝑖
   These parameters are updated using the adapted step size ηi:
                                    (𝑘+1)       (𝑘)       𝜕𝐸
                                  𝑢𝑖        = 𝑢𝑖 − 𝜂𝑖         .                            (18)
                                                          𝜕𝑢𝑖
   This process allows the step size to be adaptively varied in different parameter directions,
which can improve the convergence of the optimization algorithm and help avoid getting stuck in
local minima or jumps in the error function due to large-scale parameter differences. Taking into
account the above, (5) is rewritten as:
                                        [𝑡]            −1       𝜕𝐸
                        𝐮𝑘+1 = 𝐮𝑘 − ( 𝐻 + Λ 𝑘 𝐼) 𝐽𝑇 ∆𝐲 + 𝜂𝑖         .                      (19)
                                                                𝜕𝑢𝑖
   Expression (19) is the modified Levenberg-Marquardt method. The Levenberg-Marquardt
algorithm adjusts the neural network weights using a quadratic approximation of the error
surface. This approximation ensures that the minimum is quickly found, but the risk of hitting a
local extremum on the training surface increases.

4. Experiment
To conduct a computational experiment, the TV3-117 TE was selected as the research object. The
helicopter TE model parameters include atmospheric parameters (ρ – is the air density, PN – is the
pressure, TN – is the temperature, and h – is the flight altitude). The helicopter onboard parameters
(TG – is the gas temperature in compressor turbine front, nFT – is free turbine rotor speed, nTC – is
the gas generator rotor r.p.m) are normalized to absolute values (Table 4) based on
V. Avgustinovich theory [51, 52].

Table 4
The training set part (author's research [51, 52])
         Number                      nFT                     nTC                       TG
            1                       0.943                   0.929                    0.932
            2                       0.982                   0.933                    0.964
            3                       0.962                   0.952                    0.917
            4                       0.987                   0.988                    0.908
            5                       0.972                   0.991                    0.899
            …                         …                       …                        …
           256                      0.981                   0.973                    0.953

   In complex dynamic object identification problems, error surfaces with numerous plateaus
and valleys are often encountered, which makes the local minima task one of the main difficulties
in achieving maximum efficiency. To overcome this problem, two heuristic approaches were
developed and tested to prevent the search process from becoming trapped in local minima [53].
   According to [53], the first heuristic requires the algorithm to take risky steps in a random
direction along the error surface with increasing step length, to local minimum jump out. After
several unsuccessful attempts, in this case, 4, the minimum found is considered the smallest, and
the algorithm completes its work by the rules of the original algorithm. This heuristic was tested
on 4 data sets, including 64 input numerical variables and 1 output variable. For each data set,
the optimal neural network architecture is determined, including the number of neurons in the
hidden layer, while minimizing the error. This architecture was identified by an exhaustive search
of options, varying from 1 to 20 neurons in the nonlinear layer and from 1 to 10 neurons in the
linear layer. After this, similarly [53], 50 test runs were carried out. Each of these included 10
neural network training with different initialization of weights for each data set, without using
heuristics. At the final stage, another 50 test runs were performed, within each of which 10
networks were trained with different initialization of weights for each data set, using the
heuristics proposed in [53]. The results of testing the neural network are shown in Table 5.
Table 5
Neural network testing results (author's research, based on [53])
                  Standard deviation of traditional       The standard deviation of the modified
   Data set
                   Levenberg-Marquardt method                  Levenberg-Marquardt method
   number
               Minimum        Maximum        Average      Minimum       Maximum         Average
  Data set 1      6.95           18.38         12.67         2.84         16.12          9.48
  Data set 2      7.30           22.42         14.86         5.98         19.76          12.87
  Data set 3      0.22           0.64          0.43          0.14         0.20           0.17
  Data set 4      4.54           7.12          5.83          3.68         5.22           4.45

    Thus, the use of heuristics increases the likelihood of finding successful solutions. However, it
is worth noting that in some cases this can lead to a loss of a previously found optimal solution,
followed by getting stuck in a local minimum discovered later. This disadvantage can be
surmounted by augmenting the random step number, which, in turn, requires additional
computing resources [53]. The recommended 4 attempts provide the optimal balance for our
subject area. According to [53], the second heuristic is to change the neural network weights
when a local minimum is reached, calculating the weighting coefficients using the formula:
                                          𝑤𝑖𝑗 = 𝑤𝑖𝑗 + 𝜃 ∙ 𝑤𝑖𝑗 ,                                 (20)
where θ1 ≤ θ ≤ θ2 – random number. According to [53], the θ1 and θ2 values are chosen through
empirical means as the outcome of an experiment. Various values were tested in the range [–0.5;
0.5] with steps of 0.05 [54]. As a result, the following optimal values were obtained: θ1 = –0.035,
θ2 = 0.035.
    Thus, when a local minimum is reached, the weights of the neural network undergo random
changes by an amount not exceeding θ in both directions. Large values of θ often reset the
previous training phase, requiring a neural network to recover and start training again. This, in
essence, leads to the fact that the heuristic goes into the mode of a series of tests with different
initial weights. Small values of θ may prevent the network from exiting the local minimum. And
it will cycle back to an original minimum without benefiting from the heuristic. After shaking the
scales, training continues according to the rules of the original algorithm. If the new minimum is
not reached, a random change is made again. Through experimentation, it was revealed that the
optimal number of random changes should not exceed three. To evaluate the effectiveness of the
2nd heuristic, the same data sets were used as the first. The optimal neural network architectures
found at the previous stage were also used. Similarly, 50 test runs were carried out: 10 trainings
of neural networks with different initialization of weights for each data set, using the second
heuristic. The results of testing the neural network are shown in Table 6.

Table 6
Neural network testing results (author's research, based on [53])
                  Standard deviation of traditional       The standard deviation of the modified
   Data set
                   Levenberg-Marquardt method                  Levenberg-Marquardt method
   number
               Minimum        Maximum        Average      Minimum       Maximum         Average
  Data set 1      6.95           18.38         12.67         2.84         16.12          9.48
  Data set 2      7.30           22.42         14.86         5.98         19.76          12.87
  Data set 3      0.22           0.64          0.43          0.14         0.20           0.17
  Data set 4      4.54           7.12          5.83          3.68         5.22           4.45

   Thus, the second heuristic, like the first, significantly increases the probability of detecting a
global minimum. But it has the same drawback as the first – in some instances, it may result in
the loss of a previously found optimal solution. In addition, its use requires an empirical selection
of boundaries for changing the parameter θ.
5. Results
Labview created specialized software that uses a modified Levenberg-Marquardt algorithm and
automates the recording of experimental results (Fig. 4, 5). The program implements the
following single-threaded and parallel options: the traditional Levenberg-Marquardt method
with the calculation of J using the central difference derivative formula; the modified Levenberg-
Marquardt method with the calculation of J by the Broyden method once every two epochs with
the two heuristic approaches described above; the traditional Levenberg-Marquardt method
with the calculation of J using analytically derived formulas.


Figure 4: Diagram of complex dynamic objects neural network model (author's research)


Figure 5: Developed software user interface (author's research)
   For testing, we used a personal computer running the GNU/Linux operating system with an
AMD Ryzen 5 5600 processor. It has 6 cores with 12 threads operating at a frequency of 3.3 GHz
and 32 GB RAM for DDR-4. The computational experiment aims to assess the execution time of
each method, considering the proposed methods for calculating the Jacobian matrix. For training,
a NARX hybrid network (Fig. 2) with 7 inputs, in linear layer 5 neurons, in nonlinear layer 20
neurons, and in output layer 1 neuron is used [40]. The instantaneous functional uk+1 (19) is used
to assess training according quality to [48]. The calculation results are presented in Table 7.

Table 7
Neural network training results (author's research, based on [42])
   Data set        Central difference             Broyden method              Analytically derived
   number      t, seconds         uk+1       t, seconds         uk+1       t, seconds         uk+1
                     1 data stream, N = 256 (total training sample size [45, 46])
  Data set 1    121.382          0.988         11.785          0.985         58.639          0.983
  Data set 2    121.371          0.976         11.559          0.973         58.072          0.971
  Data set 3    120.989          0.995         10.082          0.992         56.537          0.990
  Data set 4    121.295          0.969         11.776          0.966         57.759          0.964
                    6 data streams, N = 256 (total training sample size [45, 46])
  Data set 1     40.193          0.988          3.902          0.985         19.417          0.983
  Data set 2     40.323          0.976          3.840          0.973         19.250          0.971
  Data set 3     39.669          0.995          3.306          0.992         18.281          0.990
  Data set 4     40.298          0.969          3.912          0.966         19.190          0.964
                    12 data streams, N = 256 (total training sample size [45, 46])
  Data set 1     20.298          0.988          1.971          0.985          9.806          0.983
  Data set 2     20.262          0.976          1.930          0.973          9.695          0.971
  Data set 3     20.098          0.995          1.675          0.992          9.262          0.990
  Data set 4     20.216          0.969          1.963          0.966          9.627          0.964

    Based on the results of the computational experiment, we can state the following:
    1. In hidden layer neurons number increases and the number of steps is limited, the value
    of E(w) increases and more steps are required to correct the parameters.
    2. Using the Broyden method, it was possible to reduce the computation time by
    approximately 10…12 times compared to the central difference derivative, but uk+1 increased.
    3. Direct calculations made it possible to reduce the calculation time by approximately
    2.07…2.14 times compared to the central difference derivative. With a small training sample
    size, direct calculations usually yield the minimum uk+1.
    4. Parallel versions of methods work on average about 3…6 times faster than the sequential
    implementation.
    5. uk+1 almost does not change when the number of threads changes, which indicates the
    correct implementation of parallel processing. The maximum acceleration is approximately
    61.38…120.71 times.
    The next stage of the computational experiment is devoted to obtaining and analyzing the
error in the operation of the trained neural network in the created software product on the
identified parameters. Using the training sample (Table 4), identification errors were obtained
for the following parameters of the TV3-117 TE: increase degree at compressor pressure (Fig. 6,
top left), compressor turbine shaft power (Fig. 6, top right), compressor turbine operation (Fig.
6, bottom left), in combustion chamber fuel consumption (Fig. 6, bottom right), where yellow line
is error obtained by the NARX hybrid neural network with the classical Levenberg-Marquardt
method [40], red line is error obtained by the NARX hybrid neural network with the modified
Levenberg-Marquardt method, green line is approximation line. Analysis of the results of
processing the average error of training the hybrid neural network NARX using the modified
Levenberg-Marquardt method showed a decrease in the average error value by 33 %, to a level
of 0.025.
Figure 6: Neural network training error calculating results (author's research)

   At the final stage of the computational experiment, the loss was calculated and analyzed in the
hybrid neural network NARX (Fig. 7), which serves as an indicator of the variance between the
model's predicted values and target variable during training. Loss reflects the degree of error in
the model of the research object and is used in the training process to adjust the neural network
parameters for minimize this error. The expression for calculating the loss L in a neural network
usually depends on the type of problem (such as regression or classification) and the loss function
used. For regression problems, the squared error is often used for classification tasks, cross-
entropy, or other loss functions. The general formula can be represented as the loss sum over all
examples of the training set. In this work, used [54, 55]:
                                          𝑛
                                       1
                                  𝐿 = ∙ ∑ 𝑤𝑖 ∙ (𝑦𝑖 − 𝑦̃𝑖 )2 ,                                 (21)
                                       𝑛
                                         𝑖=1
where n is examples number, yi is actual value of the target variable, 𝑦̃𝑖 is predicted value by the
model, ωi is the weight assigned to each error, allowing their significance to be taken into account
in the final loss function.


Figure 7: Loss rates during model training and testing on the input dataset (according to Table 4):
blue curve is train; orange curve is validation (author's research)


6. Discussions
Fig. 7 shows that the loss function of the hybrid neural network NARX over 500 training epochs
is generally stable and does not go beyond the limit of 0.025 (2.5 %), which indicates acceptable
losses in problems of identifying complex dynamic objects [56, 57].
   Table 8 contains comparative analysis results for thermogas-dynamic parameters
identification accuracy in the engine operating process using a neural network and classical
methods for each parameter in TV3-117 TE model [40].
Table 8
Absolute error calculation results (author's research, comparisons with [40])
          Model                                         Absolute error, %
                          Fuel consumption       Compressor       Compressor     Increase degree
                            in combustion          turbine       turbine shaft   dependence on
                               chamber           operation           power     compressor pressure
 Classical                       1.95                1.95             1.96             1.95
 Neural network:
 three-layer perceptron          0.65                0.68             0.64             0.66
 [49, 50]                        0.41                0.43             0.41             0.42
 Gaussian NARX-model
 [34]                            0.26                0.28             0.26             0.27
 Gaussian NARX-model
 with         modified
 Levenberg-Marquardt
 method

   We introduced supplementary noise to the dataset to assess neural networks resilience to
variations in input information (Table 4). This noise was incorporated into each parameter by
integrating white noise with a standard deviation of σi = 0.025 and a mean of zero. For each
parameter this corresponds to maximum value2.5 %. Table 9 illustrates the outcomes of a
comparative assessment of the precision in implementing the technique for discerning
thermogas-dynamic parameters in TV3-117 TE operating process using both neural networks
and traditional approaches.
Table 9
Absolute error (with white noise) calculation results (author's research, comparisons with [40])
                 Model                                        Absolute error, %
                                            Increase      Compressor Compressor          Fuel
                                             degree        turbine        turbine    consumption
                                          dependence         shaft      operation in combustion
                                         on compressor      power                      chamber
                                            pressure
 Classical                                    3.11           3.15           3.14         3.15
 Neural network:
 three-layer perceptron [55, 56]              1.13           1.09           1.17         1.11
 Gaussian NARX-model [40]                     0.74           0.72           0.73         0.71
 Gaussian NARX-model with modified
 Levenberg-Marquardt method                   0.43           0.41           0.42         0.40
   An examination of Table 9 demonstrates that under the stated noise conditions, the error in
identification remains within specific limits: for Gaussian NARX model with modified Levenberg-
Marquardt method is 0.43 %, for Gaussian NARX model is 0.71 %, for three-layer perceptron
structured as 7–53–36 is 1.09 % [40, 58, 59], and for thermogas-dynamic TV3-117 TE model is
3.15 %. Due to the maximum absolute error and white noise presence in applying the
identification technique for the thermogas-dynamic parameters using the least squares method
rose from 1.96 % to 3.15 %. This error increased from 0.64 % to 1.09 % [40, 58, 59] for the three-
layer perceptron structured as 7–53–36, , from 0.28 % to 0.43 % for the gaussian NARX model
with modified Levenberg-Marquardt method, and it went up from 0.43 % to 0.74 % for Gaussian
NARX model. To evaluate the dependability of the neural network approach in discerning the
thermogas-dynamic parameters of TV3-117 TE operating process [40, 58, 59], the following
formulations can be employed [60, 61]:
                                              𝑇𝑒𝑟𝑟𝑜𝑟
                                   𝐾𝑒𝑟𝑟𝑜𝑟 =          ∙ 100%,
                                                𝑇0
                                                 𝑇𝑒𝑟𝑟𝑜𝑟                                          (22)
                               𝐾𝑞𝑢𝑎𝑙𝑖𝑡𝑦 = (1 −          ) ∙ 100%,
                                                   𝑇0
where Kerror and Kquality represent coefficients for erroneous and quality identification, respectively
[62]; Terror indicates the cumulative time of segments associated with misclassification, while T0
denotes test sample duration (in this context, T0 = 5 s) [63].
   Table 10 presents coefficients computing outcomes for parameters quality identification and
both erroneous [40, 60–64], including the relations between the increase degree dependence in
compressor turbine operation, compressor turbine shaft power, the total compressor pressure,
and fuel consumption in the combustion chamber.
Table 10
Erroneous and qualitative coefficients calculating results (author's research, comparisons with [40])
             Parameter                  Gaussian NARX-model [40]        Gaussian NARX-model with
                                                                            modified Levenberg-
                                                                            Marquardt method
                                           Kerror           Kquality        Kerror        Kquality
 Compressor turbine operation              0.528           99.873          0.393          99.923
 Compressor turbine shaft power            0.523           99.871          0.389          99.921
 Increase degree dependence on
                                           0.521           99.872          0.386          99.925
 compressor pressure
 Fuel consumption in combustion
                                           0.526           99.872          0.390          99.922
 chamber

   As depicted in Table 10, the rates of erroneous identification coefficients remain below
0.393 %, while the minimum coefficients for accurate identification rate reach 99.925 %.
   The main area of practical application of the developed method is the helicopter TE monitoring
and operation controlling neural network on-board expert system [65]. The developed method
can be included as a neural network module for helicopter TE parameters identification, which
provides continuous and engine operation accurate monitoring in real-time, and also increases
the level of safety and flight efficiency.

7. Conclusions
For the first time, a concept has been created for constructing neural network models of complex
dynamic objects, in which, by increasing the robustness of the functioning of a trained neural
network, it becomes possible at its output to increase the reliability of solving problems of
identifying complex dynamic objects.
    The universal neural network model of complex dynamic objects was further developed in the
form of a hybrid neural network NARX with a radial-basis layer, in which, through the use of a
modified Levenberg-Marquardt training method, a reduction in the maximum absolute
identification error by almost 2 times is achieved – from 0.74 to 0.43 %.
    The transition from linear neural networks (multilayer perceptron) to nonlinear ones (hybrid
neural network NARX with a radial basis layer) in identification tasks of complex dynamic objects
is scientifically substantiated, providing more accurate and flexible identification of parameters
of complex dynamic objects in real-time.
    The method of calculating the elements of the Hessian matrix, a component of the analytical
expression of the Levenberg-Marquardt method, was further developed, which, by taking into
account the weight connections of neurons of both nonlinear and linear layers, allowed for to
reduction of the calculation time by approximately 10...12 times compared to the central
difference derivative. The use of direct calculations made it possible to reduce the calculation
time by approximately 2.07...2.14 times compared to the central difference derivative.
    For the first time, an analytical description of the regularization parameter has been proposed,
which is based on control coefficients of increasing or decreasing its value in the event of a change
in error, in the mathematical expression of the Levenberg-Marquardt method, which significantly
affects its performance.
    The proposed complex modification of the Levenberg-Marquardt method made it possible to
experimentally select the optimal structure of the neural network, reduce the average value of
the training error of a NARX hybrid neural network with a radial basis layer by 33 % for the level
of 0.025, and also ensure the stability of the loss function of the neural network throughout 500
training epochs, which does not exceed 2.5 %.

References
[1] R. Voliansky, A. Pranolo, Parallel mathematical models of dynamic objects, International
     Journal      of    Advances      in     Intelligent   Informatics   4:2     (2018)      120–131.
     doi: 10.26555/ijain.v4i2.229.
[2] V. Sherstjuk, M. Zharikova, I. Didmanidze, I. Dorovskaja, S. Vyshemyrska, Risk modeling
     during complex dynamic system evolution using abstract event network model, CEUR
     Workshop Proceedings 3101 (2022) 93–110.
[3] A. Sharma, E. Kosasih, J. Zhang, A. Brintrup, A. Calinescu, Digital Twins: State of the art theory
     and practice, challenges, and open research questions, Journal of Industrial Information
     Integration 30 (2022) 100383. doi: 10.1016/j.jii.2022.100383.
[4] M. Fore, M. O. Alver, J. A. Alfredsen, A. Rasheed, T. Hukkelas, H. V. Bjelland, B. Su, S. J. Ohrem,
     E. Kelasidi, T. Norton, N. Papandroulakis, Digital Twins in intensive aquaculture – Challenges,
     opportunities and future prospects, Computers and Electronics in Agriculture 218 (2024)
     108676. doi: 10.1016/j.compag.2024.108676.
[5] A. Becue, E. Maia, L. Feeken, P. Borchers, I. Praca, A New Concept of Digital Twin Supporting
     Optimization and Resilience of Factories of the Future, Applied Sciences 10 (13) (2020)
     4482. doi: 10.3390/app10134482.
[6] D. Galar, U. Kumar, Digital Twins: Definition, Implementation and Applications, Advances in
     Risk-Informed Technologies (2024) 79–106.
[7] A. Nikiforov, Automatic Control of the Structure of Dynamic Objects in High-Voltage Power
     Smart-Grid, Automation and Control (2020). doi: 10.5772/intechopen.91664. URL:
     https://www.intechopen.com/chapters/71513
[8] O. Maksymov, E. Malakhov, V. Mezhuyev, Model and method for representing complex
     dynamic information objects based on LMS-trees in NoSQL databases, Herald of Advanced
     Information Technology 4:3 (2021) 211–224. doi: 10.15276/hait.03.2021.1.
[9] D. Kahl, M. Kschischo, Searching for Errors in Models of Complex Dynamic Systems, Frontiers
     in       Physiology       11      (2020).        doi:    10.3389/fphys.2020.612590.           URL:
     https://www.frontiersin.org/journals/physiology/articles/10.3389/fphys.2020.612590/full
[10] Y. Li, Application Analysis of Artificial Intelligent Neural Network Based on Intelligent
     Diagnosis, Procedia Computer Science 208 (2022) 31–35. doi: 10.1016/j.procs.2022.10.006.
[11] A. Kupina, D. Zubov, Y. Osadchuka, R. Ivchenkoa, V. Saiapin, Intelligent Neural Networks
     Models for the Technological Separation Processes, CEUR Workshop Proceedings 3373
     (2023) 76–86.
[12] S. S. Talebi, A. Madadi, A. M. Tousi, M. Kiaee, Micro Gas Turbine fault detection and isolation
     with a combination of Artificial Neural Network and off-design performance analysis,
     Engineering Applications of Artificial Intelligence, vol. 113 (2022) 104900.
     doi: 10.1016/j.engappai.2022.104900.
[13] S. Kim, J. H. Im, M. Kim, J. Kim, Y. I. Kim, Diagnostics using a physics-based engine model in
     aero gas turbine engine verification tests, Aerospace Science and Technology, vol. 133 (2023)
     108102. doi: 10.1016/j.ast.2022.108102.
[14] J. Zeng, Y. Cheng, An Ensemble Learning-Based Remaining Useful Life Prediction Method for
     Aircraft Turbine Engine, IFAC-PapersOnLine, vol. 53, issue 3 (2020) 48–53.
     doi: 10.1016/j.ifacol.2020.11.009.
[15] B. Li, Y.-P. Zhao, Y.-B. Chen, Unilateral alignment transfer neural network for fault diagnosis
     of aircraft engine, Aerospace Science and Technology, vol. 118 (2021) 107031.
     doi: 10.1016/j.ast.2021.107031.
[16] Y. Shen, K. Khorasani, Hybrid multi-mode machine learning-based fault diagnosis strategies
     with application to aircraft gas turbine engines, Neural Networks, vol. 130 (2020) 126–142.
     doi: 10.1016/j.neunet.2020.07.001.
[17] R. Chen, X. Jin, S. Laima, Y. Huang, H. Li, Intelligent modeling of nonlinear dynamical systems
     by machine learning, International Journal of Non-Linear Mechanics, vol. 142 (2022)
     103984. doi: 10.1016/j.ijnonlinmec.2022.103984.
[18] M. Soleimani, F. Campean, D. Neagu, Diagnostics and prognostics for complex systems: A
     review of methods and challenges, Quality and Reliability Engineering, vol. 37, issue 8 (2021)
     3746–3778 doi: 10.1002/qre.2947.
[19] Z. Huang, Automatic Intelligent Control System Based on Intelligent Control Algorithm,
     Journal of Electrical and Computer Engineering, vol. 7 (2022) 1–10. doi:
     10.1155/2022/3594256
[20] S. Tian, J. Zhang, X. Shu, L. Chen, X. Niu, Y. Wang, A Novel Evaluation Strategy to Artificial
     Neural Network Model Based on Bionics, Journal of Bionic Engineering, 19 (1) (2022) 224–
     239. doi: 10.1007/s42235-021-00136-2.
[21] L. Qian, C. Liu, J. Yi, S. Liu, Application of hybrid algorithm of bionic heuristic and machine
     learning in nonlinear sequence, Journal of Physics: Conference Series 1682 (2020) 012009.
     doi: 10.1088/1742-6596/1682/1/012009.
[22] H. Lin, C. Wang, J. Sun, X. Zhang, Y. Sun, H. H. C. Iu, Memristor-coupled asymmetric neural
     networks: Bionic modeling, chaotic dynamics analysis and encryption application, Chaos,
     Solitons & Fractals 166 (2023) 112905. doi: 10.1016/j.chaos.2022.112905
[23] J. Sun, S. Sathasivam, M. Khan, Analysis and Optimization of Network Properties for Bionic
     Topology Hopfield Neural Network Using Gaussian-Distributed Small-World Rewiring
     Method, IEEE Access 10 (2022) 95369–95389. doi: 10.1109/ACCESS.2022.3204821.
[24] S. Vladov, Y. Shmelov, R. Yakovliev, Optimization of Helicopters Aircraft Engine Working
     Process Using Neural Networks Technologies, CEUR Workshop Proceedings 3171 (2022)
     1639–1656.
[25] R. Abdulkadirov, P. Lyakhov, N. Nagornov, Survey of Optimization Algorithms in Modern
     Neural Networks, Mathematics 11 (11) (2023) 2466. doi: 10.3390/math11112466
[26] J. Chen, Y. Liu, Neural optimization machine: a neural network approach for optimization and its
     application in additive manufacturing with physics-guided learning, Philosophical Transactions
     of the Royal Society A: Mathematical, Physical and Engineering Sciences, 381: 2260 (2023). doi:
     10.1098/rsta.2022.0405. URL: https://royalsocietypublishing.org/doi/10.1098/rsta.2022.0405
[27] F. Mehmood, S. Ahmad, T. K. Whangbo, An Efficient Optimization Technique for Training Deep
     Neural Networks, Mathematics 11 (6) (2023) 1360. doi: doi.org/10.3390/math11061360
[28] T. Asrav, E. Aydin, Physics-informed recurrent neural networks and hyper-parameter
     optimization for dynamic process systems, Computers & Chemical Engineering 173 (2023),
     108195. doi: 10.1016/j.compchemeng.2023.108195
[29] A. Merabet, S. Kanukollu, A. Al-Durra, E. F. El-Saadany, Adaptive recurrent neural network
     for uncertainties estimation in feedback control system, Journal of Automation and
     Intelligence 2:3 (2023), 119–129. doi: 10.1016/j.jai.2023.07.001
[30] M. F. Ab Aziz, S. A Mostafa, C. F. M. Foozy, M. A. Mohammed, M. Elhoseny, A. Z. Abualkishik,
     Integrating Elman recurrent neural network with particle swarm optimization algorithms
     for an improved hybrid training of multidisciplinary datasets, Expert Systems with
     Applications 183 (2021), 115441. doi: 10.1016/j.eswa.2021.115441
[31] Q. Ali M. N. Mahdi, M. Ali, M. N. Atta, A. Khan, S. A. Lashari, D. A. Ramli, Training Learning
     Weights of Elman Neural Network Using Salp Swarm Optimization Algorithm, Procedia
     Computer Science 225 (2023), 1974–1986. doi: 10.1016/j.procs.2023.10.188
[32] Y. Li, Z. Wang, R. Han, S. Shi, J. Li, R. Shang, H. Zheng, G. Zhong, Y. Gu, Quantum recurrent
     neural networks for sequential learning, Neural Networks 166 (2023), 148–161.
     doi: 10.1016/j.neunet.2023.07.003
[33] Y. Wang, W. Zhou, H. Feng, L. Li, H. Li, Progressive Recurrent Network for shadow removal,
     Computer          Vision      and     Image      Understanding       238    (2024),     103861.
     doi: 10.1016/j.cviu.2023.103861
[34] H. Wang, W. Jiang, X. Deng, J. Geng, A new method for fault detection of aero-engine based on
     isolation forest, Measurement 185 (2021) 110064. doi: 10.1016/j.measurement.2021.110064
[35] S. Zhernakov, A. Gilmanshin, New onboard gas turbine engine diagnostic algorithms based on
     neural‐fuzzy networks, Aviation and rocket and space technology 19: 2 (68) (2015) 63–68.
[36] S. Zhernakov, A. Gilmanshin, Realization of hybrid gas turbine engine control and diagnostics
     algorithms using modern on-board computing devices, in: Proceedings of the VII
     International conference “Actual problems of mechanical engineering”, March 25–27, 2015,
     pp. 765–769.
[37] B. Mokin, V. Mokin, O. Mokin, O. Mamyrbayev, S. Smailova, The synthesis of mathematical
     models of nonlinear dynamic systems using Volterra integral equation, Informatyka,
     Automatyka, Pomiary w Gospodarce i Ochronie Środowiska 12:2 (2022) 15–19.
     doi: 10.35784/iapgos.2947.
[38] A. Slipchuk, P. Pukach, M. Vovk, O. Slyusarchuk, Study of the dynamic process in a nonlinear
     mathematical model of the transverse oscillations of a moving beam under perturbed
     boundary conditions, Mathematical Modeling and Computing 11:1 (2024) 37–49.
     doi: 10.23939/mmc2024.01.037
[39] A. Abdulnagimov, G. Ageev, Neural network technologies in hardware-in-the-loop
     simulation: principles of gas turbine digital twin development, Informatics, Computer
     Science and Management 23: 4 (86) (2019) 115–121.
[40] S. Vladov, R. Yakovliev, O. Hubachov, J. Rud, Y. Stushchanskyi, Neural Network Modeling of
     Helicopters Turboshaft Engines at Flight Modes Using an Approach Based on “Black Box”
     Models, CEUR Workshop Proceedings 3624 (2024) 116–135.
[41] S. Vladov, I. Dieriabina, O. Husarova, L. Pylypenko, A. Ponomarenko, Multi-mode model
     identification of helicopters aircraft engines in flight modes using a modified gradient
     algorithms for training radial-basic neural networks, Visnyk of Kherson National Technical
     University 4 (79) (2021) 52–63. doi: 10.35546/kntu2078-4481.2021.4.7
[42] G. Alcan, M. Unel, V. Aran, M. Yilmaz, C. Gurel, K. Koprubasi, Diesel Engine NOx Emission
     Modeling Using a New Experiment Design and Reduced Set of Regressors, IFAC-
     PapersOnLine 51:15 (2018) 168–173. doi: 10.1016/j.ifacol.2018.09.114
[43] S. Vladov, Y. Shmelov, R. Yakovliev, Modified Searchless Method for Identification of
     Helicopters Turboshaft Engines at Flight Modes Using Neural Networks, in: Proceedings of
     the 2022 IEEE 3rd KhPI Week on Advanced Technology, Kharkiv, Ukraine, October 03–07,
     2022, pp. 257–262. doi: 10.1109/KhPIWeek57572.2022.9916422
[44] S. Novikova, E. Kremleva, Increasing the robustness of the neural network model for
     monitoring gas turbine engines based on reduction, Aircraft, aircraft engines and methods of
     their operation 3 (2019) 17–26.
[45] J. Bill, B. A. Cox, L. Champagne, A comparison of quaternion neural network backpropagation
     algorithms,         Expert    Systems     with      Applications   232    (2023)     120448.
     doi: 10.1016/j.eswa.2023.120448
[46] G. Xing, J. Gu, X. Xiao, Convergence analysis of a subsampled Levenberg-Marquardt algorithm,
     Operations Research Letters 51:4 (2023) 379–384. doi: 10.1016/j.orl.2023.05.005
[47] S. Parkhomenko, A Levenberg-Marquardt algorithm execution time reducing in case of large
     amount of the data, International Research Journal 1 (20) part 1 (2014) 80–83.
[48] S. Parkhomenko, T. Ledeneva, Training neural networks using the Levenberg-Marquardt
     method in conditions of a large amount of data, System analysis and information technology
     2 (2014) 98–106.
[49] N. Marumo, T. Okuno, A. Takeda, Majorization-minimization-based Levenberg–Marquardt
     method for constrained nonlinear least squares, Computational Optimization and
     Applications 84 (2023) 833–874. doi: 10.1007/s10589-022-00447-y.
[50] A. O. Umar, I. M. Sulaiman, M. Mamat, M. Y. Waziri, N. Zamri, On damping parameters of
     Levenberg-Marquardt algorithm for nonlinear least square problems, Journal of Physics:
     Conference Series 1734 (2021) 012018. doi: 10.1088/1742-6596/1734/1/012018
[51] S. Vladov, Y. Shmelov, R. Yakovliev, Modified Helicopters Turboshaft Engines Neural Network
     On-board Automatic Control System Using the Adaptive Control Method, CEUR Workshop
     Proceedings 3309 (2022) 205–224.
[52] S. Vladov, Y. Shmelov, R. Yakovliev, M. Petchenko, Modified Neural Network Fault-Tolerant
     Closed Onboard Helicopters Turboshaft Engines Automatic Control System, CEUR Workshop
     Proceedings, vol. 3387 (2023) 160–179.
[53] K. Makhotilo, D. Voronenko, Modification of the Levenberg-Marquardt algorithm to improve the
     accuracy of predictive models of connected energy consumption in everyday life, Bulletin of the
     National Technical University "KhPI" A series of "Information and Modeling" 56 (2005) 83–90.
[54] W. Zonghui, H. Jian, S. Xiaodan, Study on Robust Loss Function for Artificial Neural Networks
     Models in Reliability Analysis, Procedia Structural Integrity 52 (2024), 203–213.
     doi: 10.1016/j.prostr.2023.12.021
[55] S. Zhang, L. Xie, Leader learning loss function in neural network classification,
     Neurocomputing 557 (2023), 126735. doi: 10.1016/j.neucom.2023.126735
[56] S. Vladov, Y. Shmelov, R. Yakovliev, Helicopters Aircraft Engines Self-Organizing Neural
     Network Automatic Control System, CEUR Workshop Proceedings 3137 (2022) 28–47.
     doi: 10.32782/cmis/3137-3
[57] Y. Wang, H. Li, Y. Zheng, J. Peng, A fractional-order visual neural network for collision sensing
     in noisy and dynamic scenes, Applied Soft Computing 148 (2023), 110897.
     doi: 10.1016/j.asoc.2023.110897
[58] S. Vladov, Y. Shmelov, R. Yakovliev, M. Petchenko, S. Drozdova, Neural Network Method for
     Helicopters Turboshaft Engines Working Process Parameters Identification at Flight Modes,
     in: Proceedings of the 2022 IEEE 4th International Conference on Modern Electrical and
     Energy        System       (MEES),      Kremenchuk,      Ukraine,     2022,     pp.     604–609.
     doi: 10.1109/MEES58014.2022.10005670
[59] S. Vladov, Y. Shmelov, R. Yakovliev, M. Petchenko, S. Drozdova, Helicopters Turboshaft
     Engines Parameters Identification at Flight Modes Using Neural Networks, in: Proceedings
     of the IEEE 17th International Conference on Computer Science and Information
     Technologies (CSIT), Lviv, Ukraine, 2022, pp. 5–8. doi: 10.1109/CSIT56902.2022.10000444
[60] S. Marton, Stefan Ludtke, C. Bartelt, Explanations for Neural Networks by Neural Networks,
     Applied Sciences 12(3) (2022), 980. doi: 10.3390/app12030980
[61] C. Yuan, J. Y. Wang, C. E. Lee, K.-N. Chiang, Equation Informed Neural Networks with Bayesian
     Inference Improvement for the Coefficient Extraction of the Empirical Formulas, in:
     Proceedings of the 2023 24th International Conference on Thermal, Mechanical and Multi-
     Physics Simulation and Experiments in Microelectronics and Microsystems (EuroSimE),
     Graz, Austria, 2023. doi: 10.1109/EuroSimE56861.2023.10100752
[62] F. Munoz, J. M. Valdovinos, J. S. Cervantes-Rojas, S. S. Cruz, A. M. Santana, Leader–follower
     consensus control for a class of nonlinear multi-agent systems using dynamical neural
     networks, Neurocomputing 561 (2023), 126888. doi: 10.1016/j.neucom.2023.126888
[63] V. Makarov, The neural network to identify an object by a sequential training mode, Procedia
     Computer Science 190 (2021), 532–539. doi: 10.1016/j.procs.2021.06.062
[64] H. Taherdoost, Deep Learning and Neural Networks: Decision-Making Implications,
     Symmetry 15(9) (2023), 1723. doi: 10.3390/sym15091723
[65] Y. Shmelov, S. Vladov, Y. Klimova, M. Kirukhina, 60. Expert system for identification of the
     technical state of the aircraft engine TV3-117 in flight modes, in: Proceedings of the System
     Analysis & Intelligent Computing : IEEE First International Conference on System Analysis &
     Intelligent Computing (SAIC), Kyiv, Ukraine, 2018, pp. 77–82. doi: 10.1109/SAIC.2018.8516864