<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>The Fourth International Workshop on Computer Modeling and Intelligent Systems, April</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Long‐Term Forecasting Method in the Supply Chain Based on an  Artificial  Neural  Network  with  Multi‐Agent  Metaheuristic  Training  </article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Eugene Fedorov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Olga Nechyporenko</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Cherkasy State Technological University</institution>
          ,
          <addr-line>Shevchenko blvd., 460, Cherkasy, 18006</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <volume>27</volume>
      <issue>2021</issue>
      <fpage>0000</fpage>
      <lpage>0003</lpage>
      <abstract>
        <p>   The problem of increasing the efficiency of long-term forecasting in the supply chain is examined. Neural network forecasting methods that are based on reservoir calculations, which increases the forecast accuracy, are proposed. Methods for identifying parameters of forecast models based on the metaheuristics are proposed for the methods mentioned above. These methods were researched on the basis of the data from the logistics company Ekol Ukraine and are intended for intelligent computer-based supply chain management systems.</p>
      </abstract>
      <kwd-group>
        <kwd> 1  long-term forecast</kwd>
        <kwd>supply chain</kwd>
        <kwd>metaheuristics</kwd>
        <kwd>reservoir computing</kwd>
        <kwd>forecast neural network model</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction </title>
      <p>factors is possible; a complete enumeration of all possible models is not required; analysis of systems
with heterogeneous factors is possible.</p>
      <p>However, neural network methods have a lack of transparency, the complexity of the architecture
definition, strict requirements for the training sample, the complexity of the training algorithm choice,
and the resource-intensiveness of the training process. Therefore, the task of increasing the efficiency
of neural network forecast is urgent.</p>
      <p>The aim of the work is to develop a method for long-term forecasting in the supply chain. To
achieve the goal, the following tasks were set and solved:
 analyze existing forecast methods;
 propose a neural network forecast model;
 choose a criterion for evaluating the effectiveness of a neural network forecast model;
 propose a method for determining the values of the neural network forecast model parameters
based on multi-agent metaheuristics;
 perform numerical studies.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Problem statement </title>
      <p>The problem of increasing the efficiency of long-term forecasting in the supply chain is reduced
to the problem of finding such a vector of parameters W , which satisfies the forecast model adequacy
criterion F 
1 P</p>
      <p> ( f (x ,W )  d ) 2  min , i.e. deliver the minimum of the mean squared error (the
P  1 W
difference between the model output and the desired output), where P – test set cardinality, x .– th
training input value, d .– th training input value.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Literature review </title>
      <sec id="sec-3-1">
        <title>The most commonly used forecast neural networks are:</title>
        <p>1. Long short-term memory (LSTM) [11, 12];</p>
        <p>This network is based on gates (FIR filters) and a multilayer perceptron. Instead of each hidden
neuron, it uses a memory block that contains one or more cells, and is connected with input, output
and forget gates. Gates determine how much information to pass through. If the input and output gates
are close to 1 and the forget gate is close to 0, then the network turns into an Elman network. If the
input gate is close to 0, then the short-term information from the input is ignored. If the forget gate is
close to 0, then long-term information from the memory block is ignored. If the output gate is close to
0, then the output information is ignored. The advantage of this network is a higher forecast accuracy
than in a conventional multilayer perceptron. The disadvantages are the complexity of the memory
blocks implementation, insufficient forecast accuracy, the complexity of defining the architecture,
insufficient learning rate.</p>
        <p>2. Gated recurrent unit (GRU) [13-15];</p>
        <p>This network is based on gates (FIR filters) and a multilayer perceptron. Instead of each hidden
neuron, it uses a hidden block that is connected with reset and update gates. Gates determine how
much information to pass through. If the reset gate is close to 1 and the update gate is close to 0, then
the network turns into an Elman's network. If the reset gate and update gate are close to 0, then the
long-term information from the hidden block is ignored and the network becomes a multilayer
perceptron. If the update gate is close to 1, then the short-term information from the network input is
ignored. The advantage of this network is a higher forecast accuracy than in a conventional multilayer
perceptron. The disadvantages are the complexity of the hidden blocks implementation, insufficient
forecast accuracy, the complexity of defining the architecture, insufficient learning rate.
3. Neural Turing machine (NTM) [16, 17];</p>
        <p>This network is based on a Turing machine and a multilayer perceptron or LSTM and includes a
controller and a memory matrix. At any given time, the controller receives input from the outside
world and sends the output to the outside world. The controller also reads from the memory matrix
cells via the read heads and writes to the memory matrix cells via the write heads. The advantage of
this network is a higher forecast accuracy than in a conventional multilayer perceptron. The
disadvantages are the complexity of the controller implementation (in the case of LSTM) and the
complexity of defining the architecture, insufficient forecast accuracy, insufficient learning rate.
4. Echo state network (ESN) [18, 19];</p>
        <p>This network is based on reservoir computing over sigmoid neurons and a multilayer perceptron.
The hidden layer is called the reservoir. Each neuron in the reservoir may be unconnected or
connected to other neurons in the reservoir. To train the network, the pseudoinverse matrix method is
used. The advantages of this network are the highest forecast accuracy (due to the pseudoinverse
matrix method) and the ease of implementation of sigmoid neurons in the reservoir. The
disadvantages are the complexity of parallel learning and the complexity of defining the architecture.
5. Long short-term memory (LSM) [20-23].</p>
        <p>This network is based on reservoir computations over impulse neurons «Leaky Integrate and Fire»
(LIF) and multilayer perceptron. Each neuron in the reservoir may be unconnected or connected to
other neurons in the reservoir and is excitatory or inhibitory. A gradient learning method is used to
train the network. The advantages of this network are a higher forecast accuracy than in a
conventional multilayer perceptron and the possibility of parallel training for the part of the network
corresponding to a multilayer perceptron. The disadvantages are the complexity of the implementation
of impulse neurons, the complexity of defining the architecture and less high prediction accuracy, the
complexity of parallel training for the part of the network corresponding to the reservoir.</p>
        <p>Usually, the methods listed above either have a low forecast accuracy (due to falling into a local
extremum) or a low learning rate (due to the high computational complexity of the hidden neuron or
the complexity of parallelization of training) or the complexity of implementation (due to the
complexity of the hidden neuron architecture) or the complexity of defining the architecture, which
leads to a decrease in forecast efficiency.</p>
        <p>Due to this, creation of a neural network with a training method and architecture that will eliminate
the indicated disadvantages is an urgent task.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Block diagram of a neural network model for a long‐term forecast </title>
      <p>Figure  1:  Block  diagram  of  a  long‐term  forecast  model  based  on  a  fully  connected  echo  state 
network with a cascade of unit delays for an input layer neuron (FC‐ESN type 1) 
5. Neural network models for long‐term forecast 
5.1. Long‐term forecast model FC‐ESN type 1 
1. Initialization
2. Forecast
2.1. Initialization of the outputs of the neurons of the input layer</p>
      <p>
        yi(0) (n)  xi ,
2.2. Calculation of the outputs of the neurons of the hidden layer
where N (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) – the number of neurons in the first layer,
M (k) – the number of unit delays for the kth layer,
wi(jk ) – the connection weight from the ith neuron to the jth neuron on the kth layer,
b(jk) – displacement (thresholds) on the kth layer,
y (jk) (n) – the output of the jth neuron on the kth layer at time n,
f (k) – neurons activation function on the kth layer (usually f (k ) (s)  tanh(s) ).
5.2. Long‐term forecast model FC‐ESN type 2 
1. Initialization
      </p>
      <p>
        yi(
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) (n  1)  0 , i 1, N (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) .
y (j1) (n)  f (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) (s (j1) (n)) , j 1, N (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) ,
      </p>
      <p>M (0)
s (j1) (n)  b (j1) (n)   wi(j1) y (0) (n  i) </p>
      <p>i0</p>
      <p>
        M (0) N(
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) M (
        <xref ref-type="bibr" rid="ref2">2</xref>
        )
  wi(j1) y (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) (n  (i  (M (0)  N (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) )) ,
      </p>
      <p>
        iM (0) N (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) 1
2.3. Calculation of the outputs of the neurons of the output layer
M (0) N (
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
 wi(j1) y (
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
      </p>
      <p>
        iM (0) (n  1) 
iM (0) 1
y (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) (n)  f (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) (s (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) (n)) ,
      </p>
      <p>
        M (0)
s (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) (n)  b (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) (n)   wi(
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) y (0) (n  i) 
i0
6. Criterion  for  evaluating  the  effectiveness  of  a  neural  network  model  for 
long‐term forecast 
      </p>
      <p>
        In this work, to determine the parameters values of the FC-ESN model, the criterion of the model
adequacy was chosen, which means the choice of such values of the parameters W  {wi(j1) , wi(
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) } ,
which deliver the minimum of the mean squared error (the difference between the model output and
the desired output):
      </p>
      <p>F 
1 P</p>
      <p>
         ( y(
        <xref ref-type="bibr" rid="ref2">2</xref>
        )  d )2  min , 
P  1 W
(
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) 
where P – the test set cardinality.
7. Method  for  determining  the  parameters  values  of  the  neural  network 
model for long‐term forecast 
      </p>
      <p>The method for determining the parameters values of the neural network model for long-term
forecasting is reduced to calculating the weights of the hidden layer and the output layer of the
FCESN model.</p>
    </sec>
    <sec id="sec-5">
      <title>7.1. Calculating the weights of the hidden layer </title>
      <p>The weights of the hidden layer are calculated as follows:
1. Initialize randomly biases (thresholds) b(j1) and weights wi(j1) .</p>
      <sec id="sec-5-1">
        <title>2. Make up from weights</title>
        <p>
          wi(j1) , i  M (0)  1, M (0)  N (
          <xref ref-type="bibr" rid="ref1">1</xref>
          ) , j 1, N (
          <xref ref-type="bibr" rid="ref1">1</xref>
          ) , matrix
W  [wij ] ,
i, j 1, N (
          <xref ref-type="bibr" rid="ref1">1</xref>
          ) .
        </p>
        <p> 
3. Determine the matrix W as W  </p>
        <p>
          W
max {|  j |}
j1,N (
          <xref ref-type="bibr" rid="ref1">1</xref>
          )
        </p>
        <p>
where  – spectral radius of the matrix W (for large  learning is faster, but long short-term
memory decreases), 0    1 ,
 j – eigenvalues of matrix W .</p>
        <p>
          4. Assign to the weights wi(j1) (n) , i  M (0)  1, M (0)  N (
          <xref ref-type="bibr" rid="ref1">1</xref>
          ) , j 1, N (
          <xref ref-type="bibr" rid="ref1">1</xref>
          ) , the values of the

corresponding elements of the matrix W .
7.2.  The  output  layer  weights  calculation  based  on  the  multi‐agent 
metaheuristic SAPSO method 
        </p>
        <p>The proposed SAPSO (simulated annealing and particle swarm optimization) method for
numerical functions optimization consists of the following blocks (Figure 3).</p>
        <p>1. Initialization
2. Modification of the speed of each particle using</p>
        <p>simulated annealing
3. Modification of the position of each particle
4. Determination of the particle of the current</p>
        <p>population with the best position</p>
      </sec>
      <sec id="sec-5-2">
        <title>5. Determining the global best position</title>
        <p>6. n&lt;N</p>
        <p>yes
not
x*
Figure  3:  The  sequence  of  procedures  of  the  optimization  method  based  on  the  multi‐agent 
metaheuristic SAPSO method </p>
        <p>Block 1 - Initialization:
 setting the maximum number of iterations N ;
 setting the size of the swarm K (usually no more than 40);
 setting the dimension of the particle position M (corresponds to the number of weights in the
output layer);
 setting the number of the current iteration n to one;
 initialization of position xk (corresponds to the solution, i.e. the vector of the weights of the
output layer)</p>
        <p>
          xk  ( xk1 ,..., xkM ) , xij  ( x mjax  x mjin )U (
          <xref ref-type="bibr" rid="ref1">0,1</xref>
          )  x mjin , k 1, K ,
where U (
          <xref ref-type="bibr" rid="ref1">0,1</xref>
          ) – a function that returns a uniformly distributed random number in a range [0,1] ,
x mjin , x mjax – minimum and maximum value;




initialization of personal (local) best position x best
        </p>
        <p>k
xkbest  xk , k 1, K ;
speed initialization vk
creating an initial particle swarm</p>
        <p>vk  (vk1 ,..., vkM ) , vij  0 , k 1, K ;
determination of the particle of the current population with the best position</p>
        <p>Q  {(xk , xkbest , vk )} ;
k *  arg min F ( xk ) ,</p>
        <p>k1,K
x*  xk* .</p>
        <p>
          Block 2 - Modification of the speed of each particle using simulated annealing
Block 2.1 – Calculating two vectors of random numbers for each particle
r1k  (r1k1,..., r1kM ) , r1kj {U (
          <xref ref-type="bibr" rid="ref1">0,1</xref>
          ),C(
          <xref ref-type="bibr" rid="ref1">0,1</xref>
          ), N (
          <xref ref-type="bibr" rid="ref1">0,1</xref>
          )} , k 1, K , j 1, M ,
r 2k  (r 2k1,..., r 2kM ) , r 2kj {U (
          <xref ref-type="bibr" rid="ref1">0,1</xref>
          ),C(
          <xref ref-type="bibr" rid="ref1">0,1</xref>
          ), N (
          <xref ref-type="bibr" rid="ref1">0,1</xref>
          )}, k 1, K , j 1, M ,
where N (
          <xref ref-type="bibr" rid="ref1">0,1</xref>
          ) – a function that returns a random number from a standard normal distribution,
C(
          <xref ref-type="bibr" rid="ref1">0,1</xref>
          ) – a function that returns a random number from a standard Cauchy distribution,
Block 2.2 – Calculating annealing temperature
        </p>
        <p>T (n)   T (n 1) , T (0)  T0 ,
  N  N11 , T0  N N 1 ,</p>
        <p>N
where T (n) – annealing temperature at iteration n ,</p>
        <sec id="sec-5-2-1">
          <title>T0 – initial annealing temperature,</title>
          <p> – parameter controlling annealing temperature.</p>
          <p>Block 2.3 – Calculating parameter controlling the contribution of the component
1(n)   2 (n)   (0) exp(1/ T (n)) , w(n)  w(0) exp(1/ T (n)) ,</p>
          <p>1
 (0)   0  0.5  ln 2 , w(0)  w0  ,
2ln 2
where  1(n) – parameter controlling the contribution of the component (xkbest  xk )(r1 )T to the
particle velocity at the iteration n ,
 2 (n) – parameter controlling the contribution of the component ( x*  xk )(r2 )T to the particle
velocity at the iteration n ,
w(n) – parameter controlling the contribution of the particle velocity at iteration n -1 to the particle
velocity at iteration n ,
 0 – initial value of parameters 1(n) and  2 (n) ,
w0 – initial value of parameter w(n) ,</p>
          <p>The simulated annealing introduced in this work makes it possible to establish an inverse
relationship between parameters  1(n) ,  2 (n) , w(n) and the iteration number, i.e. at the initial
iterations, the entire search space is explored (in this case, the Cauchy distribution is used), and at the
final iterations, the search becomes directional (in this case, the normal distribution is used). In
addition, in this work, a direct relationship was established between parameters T0 and  and the
iteration number, which makes it possible to automate the selection of these parameters.
1
2</p>
        </sec>
        <sec id="sec-5-2-2">
          <title>The choice of initial values  0  0.5  ln 2 and w0 </title>
          <p>is standard and satisfies the conditions
for the particle swarm convergence w  1 and w0 </p>
        </sec>
      </sec>
      <sec id="sec-5-3">
        <title>Block 2.4 – Вычисление speed of each particle</title>
      </sec>
      <sec id="sec-5-4">
        <title>Block 3 – Modification of the position of each particle Block 3.1 Limiting the speed of each particle</title>
        <p>vk  w(n)vk  1 (n)( xkbest  xk )(r1 )T  2 (n)( x*  xk )(r2 )T , k 1, K ,
Block 3.2 – Calculating position of each particle</p>
        <p>vkj
vkj  
0,
vkj  ( x mjin , x mjax )
vkj {x mjin , x mjax } , k 1, K , j 1, M .</p>
        <p>xk  xk  vk , k 1, K ,
xkj  xkj ,
x mjax , xkj  x mjax
x mjin ,
xkj  x mjin
xkj  ( x mjin , x mjax ) , k 1, K , j 1, M ,
Block 4 - Determination of the personal (local) best position of each particle</p>
        <p>If F (xk )  F (xkbest ) , then xkbest  xk , k 1, K .</p>
        <p>Block 5 - Determination of the particle of the current population with the best position
k *  arg min F (xk ) .</p>
        <p>k1,K</p>
        <p>If F (xk* )  F ( x* ) , то x*  xk* .</p>
        <p>Block 6 - Determining the global best position</p>
      </sec>
      <sec id="sec-5-5">
        <title>Block 7 - Stop condition</title>
        <p>If n  N , then increase the iteration number n by one and go to block 2.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>8. Experiments and results </title>
      <p>Modeling of the process of the neural network model values determination was carried out in the
Matlab package using Parallel Computing Toolbox. Since the formation of each particle in block 1,
the modification of the speed, position and local best position of each particle in blocks 2-4,
respectively, occurs independently of other particles, and the order of formation and modification of
particles is arbitrary, it is proposed to perform parallel processing of particles using a parallel parfor
loop. Parfor is part of Parallel Computing Toolbox, replaces the sequential for loop and is based on
OpenMP technology, but unlike it, it can be used not only on a local multicore machine, but also on a
cluster. The advantage of this approach over the CUDA and MPI technologies (represented in the
Parallel Computing Toolbox by the spmd block) is the simplicity and clarity of the technical
implementation. Due to the small number of particles, it becomes possible to perform the formation
and modification of each particle on the corresponding physical core of the machines processors
united in a cluster.</p>
      <p>Swarm size was selected as K =40.</p>
      <p>To determine the type of distribution used in the SAPSO method, a number of experiments were
carried out, the results of which are presented in Table 1.</p>
      <p>Table 1 
Comparative characteristics of distribution types </p>
      <p>Distribution type 
Criterion </p>
      <p>Number of iterations </p>
      <p>
        U(
        <xref ref-type="bibr" rid="ref1">0,1</xref>
        ) 
      </p>
      <p>
        According to Table 1, the distribution U(
        <xref ref-type="bibr" rid="ref1">0,1</xref>
        ) requires the least number of iterations while
maintaining the required forecast accuracy.
      </p>
      <p>To define the structure of a long-term forecast model based on FC-ESN, i.e. determining the
number of hidden neurons, a number of experiments were carried out, the results of which are
presented in Figure 4.</p>
      <p>A sample of values based on data from the logistics company Ekol Ukraine was used as input data
to determine the parameters values of the neural network model for the long-term forecast. The
criterion for choosing the structure of the neural network model was the minimum mean squared
forecast error. As can be seen from Figure 4, with an increase in the number of hidden neurons, the
error value decreases. It is sufficient to use 16 neurons in the hidden layer for the forecast, since with
a further increase in the number of neurons in the hidden layer, the change in the error value is
insignificant.</p>
      <p>
        The neural networks for long-term forecasting were investigated in the work according to the
criterion of the minimum mean squared error (MSE) of the forecast and computational complexity
(Table 2), where M (k) – the number of unit delays for the kth layer, S – the number of cell, N (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) – the
number of neurons in the first layer, P – training set cardinality, N – number of iterations of the
multi-agent metaheuristic method SAPSO, N &lt;&lt; P , N (
        <xref ref-type="bibr" rid="ref1">1</xref>
        )  P .
      </p>
      <p>According to Table 2, FC-ESN type 2 has the highest forecast accuracy, and Type 1 FC-ESN
network has the lowest computational complexity.</p>
      <p>Based on the performed experiments, the following conclusions can be drawn.</p>
      <p>The LSTM network has average learning rates and forecast accuracy.</p>
      <p>Table 2 
Comparative characteristics of neural networks for long‐term forecast 
 </p>
      <p>Network 
Criterion  
Minimum MSE 
of the forecast 
Computational 
complexity </p>
      <p>Full  
LSTM 
0.12 </p>
      <p>GRU </p>
      <p>The GRU network is second only to the author's networks in learning speed (it uses a gradient
learning method and less computational complexity than LSTM and ESN). But it has the least
prediction accuracy (due to the gradient learning method and a simplified architecture compared to
LSTM).</p>
      <p>ESN networks are inferior in forecast accuracy only to the author's networks, since they are trained
on the basis of the pseudoinverse matrix method. But it has the lowest learning rate (it has the highest
computational complexity, and the pseudoinverse matrix method does not provide for parallelism).</p>
      <p>The author's FC-ESN networks are trained on the basis of the proposed metaheuristic, which
increases the forecast accuracy (low probability of hitting the local extremum) and the learning rate
(provides parallel learning), and does not have the complex implementation.</p>
    </sec>
    <sec id="sec-7">
      <title>9. Conclusions </title>
      <p>The article discusses the problem of improving the efficiency of long-term forecasting in the
supply chain. To solve this problem, the existing forecasting methods were investigated. These studies
have shown that by far the most effective is the use of artificial neural networks. To improve the
quality of the long-term forecast, an ESN neural network was chosen, modified (by introducing full
connectivity and cascades of unit delays in the input and output layers), and in the course of a
numerical study, the structure of its model was determined. The experiments have shown that with 16
hidden neurons, the value of the mean squared error does not change significantly, and the selected
network gives forecast results with a minimum deviation. A method was proposed for determining the
parameters values of the proposed neural network model for long-term forecast. This allowed to
ensure high speed and accuracy of the forecast. The proposed methods are intended for software
implementation in the Matlab package using Parallel Computing Toolbox, which speeds up the
process of finding a solution. The software implementing the proposed methods was developed and
researched on the database of the logistics company Ekol Ukraine. The conducted experiments have
confirmed the efficiency of the developed software allowing to recommend it for practical use in
solving problems of supply chain management. Prospects for further research are in applying the
proposed methods on a wider set of benchmarks.
10.References 
[6] P. Bidyuk, T. Prosyankina-Zharova, O. Terentiev, Modelling nonlinear nonsta-tionary processes
in macroeconomy and finances, in: Z. Hu, S. Petoukhov, I. Dychka, M. He (Eds.), Advances in
Computer Science for Engineering and Education. Advances in Intelligent Systems and
Computing, volume 754, Springer, Cham, 2019, pp. 735–745. doi:
10.1007/978-3-319-910086_72.
[7] L. Lyubchyk, E. Bodyansky, A. Rivtis, Adaptive harmonic components detection and forecasting
in wave non-periodic time series using neural networks, in: Proceedings of the ISCDMCI'2002,
Evpatoria, 2002, pp. 433-435.
[8] K.-L. Du, K. M. S. Swamy, Neural networks and statistical learning, Springer-Verlag, London,
2014.
[9] S. Haykin, Neural networks, Pearson Education, New York, NY, 1999.
[10] S. N. Sivanandam, S. Sumathi, S. N. Deepa, Introduction to neural networks using Matlab 6.0,</p>
      <p>The McGraw-Hill Comp., Inc., New Delhi, 2006.
[11] S. Hochreiter, J. Schmidhuber, Long short-term memory, in: Neural Computation, volune 9,
1997, pp. 1735-1780. doi: 10.1162/neco.1997.9.8.1735.
[12] F. Gers, Long Short-Term Memory in Recurrent Neural Networks, PhD thesis, Ecole</p>
      <p>Polytechnique Federale de Lausanne.
[13] K. Cho, B. van Merrienboer, C. Gulcehre, F Bougares, H Schwenk, Y. Bengio, Learning phrase
representations using RNN encoder-decoder for statistical machine translation, in: Proceedings of
the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha,
Qatar, 2014, pp. 1724–1734. doi: 10.3115/v1/D14-1179.
[14] R. Dey, F. M. Salem, Gate-Variants of Gated Recurrent Unit (GRU) Neural Networks,
arXiv:1701.05923, 2017. URL: https://arxiv.org/ftp/arxiv/papers/1701/1701.05923.pdf.
[15] E. Fedorov, T. Utkina, О. Nechyporenko, Forecast method for natural language constructions
based on a modified gated recursive block, in: CEUR Workshop Proceedings, vol. 2604, 2020,
pp. 199-214.
[16] A. Graves, G. Wayne, M. Reynolds et al., Hybrid computing using a neural network with
dynamic external memory, Nature 538 (2016) 471–476. doi:10.1038/nature20101.
[17] R. B. Greve, E. J. Jacobsen, S. Risi, Evolving neural turing machines for reward-based learning,
in: Proceedings of the 2016 Genetic and Evolutionary Computation Conference, GECCO’16,
ACM, 2016, pp. 117–124. doi: 10.1145/2908812.2908930.
[18] H. Jaeger, Tutorial on Training Recurrent Neural Networks, Covering BPPT, RTRL, EKF and
the Echo State Network Approach, GMD Report 159, German National Research Center for
Information Technology, 2002.
[19] H. Jaeger, M. Lukosevicius, D. Popovici, U. Siewert, Optimization and applications of echo state
networks with leakyintegrator neurons, in: Neural Networks volume 20, 2007, pp. 335–352.
doi:10.1016/j.neunet.2007.04.016.
[20] T. Natshlager, W. Maas, H. Markram, The liquid computer: A novel strategy for real-time
computing on time series, in: Special Issue on Foundations of Information Processing of
Telematik, 2002, pp. 39–43.
[21] Q. Wang, P. Li, D-lsm: Deep liquid state machine with unsupervised recurrent reservoir tuning,
in Pattern Recognition (ICPR), in: 23rd International Conference on Pattern Recognition (ICPR)
(Cancun: IEEE), 2016, pp. 2653–2658. doi: 10.1109/ICPR.2016.7900035.
[22] W. Maass, Liquid state machines: motivation, theory, and applications, in: Computability in
context: computation and logic in the real world, 2011, pp. 275–296.
doi: 10.1142/9781848162778_0008.
[23] T. Neskorodieva, E. Fedorov, I. Izonin, Forecast method for audit data analysis by modified
liquid state machine, in: CEUR Workshop Proceedings, 2020, volume 2631, pp. 145-158.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J. F.</given-names>
            <surname>Cox</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. G.</surname>
          </string-name>
          <article-title>Schleher, Theory of constraints handbook</article-title>
          , New York, NY,
          <string-name>
            <surname>McGraw-Hill</surname>
          </string-name>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>E. M.</given-names>
            <surname>Goldratt</surname>
          </string-name>
          ,
          <article-title>My saga to improve production, in: Selected Readings in Constraints Management, Falls Church</article-title>
          ,
          <source>VA: APICS</source>
          ,
          <year>1996</year>
          , pp.
          <fpage>43</fpage>
          -
          <lpage>48</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>E. M.</given-names>
            <surname>Goldratt</surname>
          </string-name>
          ,
          <article-title>Production: The TOC way, including CD-ROM simulator and workbook, Revised edition</article-title>
          , Great Barrington, MA: North River Press,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Smerichevska</surname>
          </string-name>
          et al,
          <article-title>Cluster Policy of Innovative Development of the National Economy: Integration and Infrastructure Aspects: monograph</article-title>
          , S. Smerichevska (Eds.),
          <source>Wydawnictwo naukowe WSPIA</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>R. T.</given-names>
            <surname>Baillie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Kapetanios</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Papailias</surname>
          </string-name>
          ,
          <article-title>Modified information criteria and selection of long memory time series models</article-title>
          ,
          <source>in: Computational Statistics and Data Analysis</source>
          , volume
          <volume>76</volume>
          ,
          <year>2014</year>
          , pp.
          <fpage>116</fpage>
          -
          <lpage>131</lpage>
          . doi:
          <volume>10</volume>
          .1016/j.csda.
          <year>2013</year>
          .
          <volume>04</volume>
          .012.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>