=Paper= {{Paper |id=Vol-2864/paper1 |storemode=property |title=Long-Term Forecasting Method in the Supply Chain Based on an Artificial Neural Network with Multi-Agent Metaheuristic Training |pdfUrl=https://ceur-ws.org/Vol-2864/paper1.pdf |volume=Vol-2864 |authors=Eugene Fedorov,Olga Nechyporenko |dblpUrl=https://dblp.org/rec/conf/cmis/FedorovN21 }} ==Long-Term Forecasting Method in the Supply Chain Based on an Artificial Neural Network with Multi-Agent Metaheuristic Training== https://ceur-ws.org/Vol-2864/paper1.pdf
Long‐Term Forecasting Method in the Supply Chain Based on an
Artificial Neural Network with Multi‐Agent Metaheuristic
Training
Eugene Fedorova and Olga Nechyporenkoa
a
    Cherkasy State Technological University, Shevchenko blvd., 460, Cherkasy, 18006, Ukraine


                  Abstract
                  The problem of increasing the efficiency of long-term forecasting in the supply chain is
                  examined. Neural network forecasting methods that are based on reservoir calculations,
                  which increases the forecast accuracy, are proposed. Methods for identifying parameters of
                  forecast models based on the metaheuristics are proposed for the methods mentioned above.
                  These methods were researched on the basis of the data from the logistics company Ekol
                  Ukraine and are intended for intelligent computer-based supply chain management systems.

                  Keywords 1
                  long-term forecast, supply chain, metaheuristics, reservoir computing, forecast neural
                  network model

1. Introduction
   These days, domestic and foreign companies are striving to improve and optimize their business
processes with the implementation of the Lean Production technology and principles, the uniqueness
of which lies in the fact that it is effective for enterprises of various industries at any stage of the
supply chain of products or services to the end consumer. [1-3]. The Lean Production concept is
dominant in the formation of "perfect" supply chains, which, in the context of globalization and
digitalization of the economy based on information and communication technologies, is the most
important factor in competitiveness [4]. One of the most important problems in the field of supply
chain management is the insufficiently high accuracy of the forecast. This leads to the fact that supply
chain management can be ineffective. Therefore, the development of forecasting methods in the
supply chain is an urgent task.
   To date, many approaches are known as long-term forecasting tools, among which are:
    autoregressive forecasting methods [5];
    forecasting methods based on exponential smoothing [6];
    neural network forecasting methods [7-10].
   Autoregressive methods have the complex determination of the functional dependencies type, the
labor intense determination of the model parameters, low adaptability and the lack of the ability to
model nonlinear processes.
   Neural network forecasting methods provide a tangible advantage, consisting of: the relationships
between factors are investigated on ready-made models; no assumptions about the distribution of
factors are required; a priori information about factors may be missing; the original data may be
highly correlated, incomplete or noisy; analysis of systems with a high degree of nonlinearity is
possible; rapid model development; high adaptability; analysis of systems with a large number of



CMIS-2021: The Fourth International Workshop on Computer Modeling and Intelligent Systems, April 27, 2021, Zaporizhzhia, Ukraine
EMAIL: fedorovee@ukr.net (E. Fedorov); olne@ukr.net (O. Nechyporenko)
ORCID: 0000-0003-3841-7373 (E. Fedorov); 0000-0002-3954-3796 (O. Nechyporenko)
             © 2021 Copyright for this paper by its authors.
             Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
             CEUR Workshop Proceedings (CEUR-WS.org)
factors is possible; a complete enumeration of all possible models is not required; analysis of systems
with heterogeneous factors is possible.
   However, neural network methods have a lack of transparency, the complexity of the architecture
definition, strict requirements for the training sample, the complexity of the training algorithm choice,
and the resource-intensiveness of the training process. Therefore, the task of increasing the efficiency
of neural network forecast is urgent.
   The aim of the work is to develop a method for long-term forecasting in the supply chain. To
achieve the goal, the following tasks were set and solved:
    analyze existing forecast methods;
    propose a neural network forecast model;
    choose a criterion for evaluating the effectiveness of a neural network forecast model;
    propose a method for determining the values of the neural network forecast model parameters
   based on multi-agent metaheuristics;
    perform numerical studies.

2. Problem statement
       The problem of increasing the efficiency of long-term forecasting in the supply chain is reduced
to the problem of finding such a vector of parameters W , which satisfies the forecast model adequacy
              1 P
criterion F     
              P  1
                     ( f ( x  , W )  d  ) 2  min , i.e. deliver the minimum of the mean squared error (the
                                                  W

difference between the model output and the desired output), where P – test set cardinality, x  .– th
training input value, d  .– th training input value.


3. Literature review
    The most commonly used forecast neural networks are:
    1. Long short-term memory (LSTM) [11, 12];
    This network is based on gates (FIR filters) and a multilayer perceptron. Instead of each hidden
neuron, it uses a memory block that contains one or more cells, and is connected with input, output
and forget gates. Gates determine how much information to pass through. If the input and output gates
are close to 1 and the forget gate is close to 0, then the network turns into an Elman network. If the
input gate is close to 0, then the short-term information from the input is ignored. If the forget gate is
close to 0, then long-term information from the memory block is ignored. If the output gate is close to
0, then the output information is ignored. The advantage of this network is a higher forecast accuracy
than in a conventional multilayer perceptron. The disadvantages are the complexity of the memory
blocks implementation, insufficient forecast accuracy, the complexity of defining the architecture,
insufficient learning rate.
    2. Gated recurrent unit (GRU) [13-15];
    This network is based on gates (FIR filters) and a multilayer perceptron. Instead of each hidden
neuron, it uses a hidden block that is connected with reset and update gates. Gates determine how
much information to pass through. If the reset gate is close to 1 and the update gate is close to 0, then
the network turns into an Elman's network. If the reset gate and update gate are close to 0, then the
long-term information from the hidden block is ignored and the network becomes a multilayer
perceptron. If the update gate is close to 1, then the short-term information from the network input is
ignored. The advantage of this network is a higher forecast accuracy than in a conventional multilayer
perceptron. The disadvantages are the complexity of the hidden blocks implementation, insufficient
forecast accuracy, the complexity of defining the architecture, insufficient learning rate.
    3. Neural Turing machine (NTM) [16, 17];
    This network is based on a Turing machine and a multilayer perceptron or LSTM and includes a
controller and a memory matrix. At any given time, the controller receives input from the outside
world and sends the output to the outside world. The controller also reads from the memory matrix
cells via the read heads and writes to the memory matrix cells via the write heads. The advantage of
this network is a higher forecast accuracy than in a conventional multilayer perceptron. The
disadvantages are the complexity of the controller implementation (in the case of LSTM) and the
complexity of defining the architecture, insufficient forecast accuracy, insufficient learning rate.
    4. Echo state network (ESN) [18, 19];
    This network is based on reservoir computing over sigmoid neurons and a multilayer perceptron.
The hidden layer is called the reservoir. Each neuron in the reservoir may be unconnected or
connected to other neurons in the reservoir. To train the network, the pseudoinverse matrix method is
used. The advantages of this network are the highest forecast accuracy (due to the pseudoinverse
matrix method) and the ease of implementation of sigmoid neurons in the reservoir. The
disadvantages are the complexity of parallel learning and the complexity of defining the architecture.
    5. Long short-term memory (LSM) [20-23].
    This network is based on reservoir computations over impulse neurons «Leaky Integrate and Fire»
(LIF) and multilayer perceptron. Each neuron in the reservoir may be unconnected or connected to
other neurons in the reservoir and is excitatory or inhibitory. A gradient learning method is used to
train the network. The advantages of this network are a higher forecast accuracy than in a
conventional multilayer perceptron and the possibility of parallel training for the part of the network
corresponding to a multilayer perceptron. The disadvantages are the complexity of the implementation
of impulse neurons, the complexity of defining the architecture and less high prediction accuracy, the
complexity of parallel training for the part of the network corresponding to the reservoir.
    Usually, the methods listed above either have a low forecast accuracy (due to falling into a local
extremum) or a low learning rate (due to the high computational complexity of the hidden neuron or
the complexity of parallelization of training) or the complexity of implementation (due to the
complexity of the hidden neuron architecture) or the complexity of defining the architecture, which
leads to a decrease in forecast efficiency.
    Due to this, creation of a neural network with a training method and architecture that will eliminate
the indicated disadvantages is an urgent task.

4. Block diagram of a neural network model for a long‐term forecast
    Figures 1-2 show a block diagram of a long-term forecast model based on a fully connected echo
state network (FC-ESN), which is a recurrent two-layer neural network. Unlike traditional ESN, this
network is fully connected, using cascades of unit delays. FC-ESN type 1 has a unit delay stage in the
input layer. FC-ESN type 2 has a unit delay stage in the input and output layers. The number of input
and output neurons is 1.



                        Output
                        neurons

                                                                          …

                        Input
                        neurons

                                                                     Hidden
                                             t-1             t
                                                                     neurons
Figure 1: Block diagram of a long‐term forecast model based on a fully connected echo state
network with a cascade of unit delays for an input layer neuron (FC‐ESN type 1)
                              Output
                              neurons

                                                                                                                             …

                               Input
                               neurons

                                                                                                                     Hidden
                                                                      t-1                     t
                                                                                                                     neurons
Figure 2: Block diagram of a long‐term forecast model based on a fully connected echo‐state
network with a cascade of unit delays for a neuron of the input and output layers (FC‐ESN type 2)

5. Neural network models for long‐term forecast

5.1. Long‐term forecast model FC‐ESN type 1
   1. Initialization
                                                                            n 1.
                                                        y i(1) ( n  1)  0 , i  1, N (1) .
   2. Forecast
   2.1. Initialization of the outputs of the neurons of the input layer
                                                y i( 0) ( n)  xi ,
   2.2. Calculation of the outputs of the neurons of the hidden layer
                                                y (j1) (n)  f (1) ( s (j1) (n)) , j  1, N (1) ,
                                                       M (0)                               M ( 0 )  N (1 )
                       s (j1) (n)  b (j1) (n)         
                                                        i 0
                                                               wij(1) y ( 0 ) (n  i)           w y
                                                                                                    (0)
                                                                                                               (1)
                                                                                                               ij
                                                                                                                       (1)
                                                                                                                       i M ( 0)
                                                                                                                                 (n  1) 
                                                                                           iM            1
                                                            (1)
                                                          wM  (0)
                                                                    N (1) 1
                                                                              y ( 2 ) (n  1) ,
   2.3. Calculation of the outputs of the neurons of the output layer
                                         y ( 2) (n)  f ( 2) ( s ( 2) (n)) ,
                                                             M ( 0)                               M ( 0 )  N (1 )
                         s   ( 2)
                                    (n)  b   (2)
                                                    ( n)     i 0
                                                                      wi( 2 ) y ( 0) (n  i)         w y(0)
                                                                                                                     ( 2)
                                                                                                                     i
                                                                                                                            (1)
                                                                                                                            iM ( 0)
                                                                                                                                     ( n) .
                                                                                                  iM           1
          (1)
where N – the number of neurons in the first layer,
M ( k ) – the number of unit delays for the kth layer,
wij( k ) – the connection weight from the ith neuron to the jth neuron on the kth layer,
b (k
  j
     )
       – displacement (thresholds) on the kth layer,
y (jk ) (n) – the output of the jth neuron on the kth layer at time n,
f ( k ) – neurons activation function on the kth layer (usually f ( k ) ( s )  tanh( s ) ).

5.2. Long‐term forecast model FC‐ESN type 2
   1. Initialization
                                                                            n 1.
                                                        y i(1) ( n  1)  0 , i  1, N (1) .
   2. Forecast
   2.1. Initialization of the outputs of the neurons of the input layer
                                                y i( 0) ( n)  xi ,
   2.2. Calculation of the outputs of the neurons of the hidden layer
                                            y (j1) (n)  f (1) ( s (j1) (n)) , j  1, N (1) ,
                                                  M (0)                               M ( 0 )  N (1 )
                      s (j1) (n)  b (j1) (n)     
                                                   i 0
                                                           wij(1) y ( 0 ) (n  i)          w y
                                                                                              (0)
                                                                                                         (1)
                                                                                                         ij
                                                                                                                 (1)
                                                                                                                 i  M ( 0)
                                                                                                                            (n  1) 
                                                                                      iM           1
                                        M ( 0 )  N ( 1)  M ( 2 )
                                                       wij(1) y ( 2) (n  (i  ( M ( 0)  N (1) )) ,
                                        i  M  N (1) 1
                                                (0)


   2.3. Calculation of the outputs of the neurons of the output layer
                                         y ( 2) (n)  f ( 2) ( s ( 2) (n)) ,
                                                         M ( 0)                             M ( 0 )  N (1 )
                        s ( 2 ) ( n )  b ( 2 ) ( n)    i 0
                                                                  wi( 2) y ( 0) (n  i)        w y           ( 2)
                                                                                                               i
                                                                                                                      (1)
                                                                                                                      iM ( 0)
                                                                                                                               ( n) .
                                                                                            i  M ( 0 ) 1



6. Criterion for evaluating the effectiveness of a neural network model for
   long‐term forecast
   In this work, to determine the parameters values of the FC-ESN model, the criterion of the model
adequacy was chosen, which means the choice of such values of the parameters W  {wij(1) , wi( 2 ) } ,
which deliver the minimum of the mean squared error (the difference between the model output and
the desired output):
                                 1 P
                            F
                                 P  1
                                                  
                                        ( y ( 2)  d  ) 2  min ,
                                                               W
                                                                                          (1)

where P – the test set cardinality.

7. Method for determining the parameters values of the neural network
   model for long‐term forecast
   The method for determining the parameters values of the neural network model for long-term
forecasting is reduced to calculating the weights of the hidden layer and the output layer of the FC-
ESN model.

7.1. Calculating the weights of the hidden layer
   The weights of the hidden layer are calculated as follows:
   1. Initialize randomly biases (thresholds) b (j1) and weights wij(1) .
   2. Make up from weights                       wij(1) , i  M ( 0 )  1, M ( 0 )  N (1) , j  1, N (1) , matrix W  [ wij ] ,

i, j  1, N (1) .
                                                             W
   3. Determine the matrix W as W                                       ,
                                                            max {|  j |}
                                                            j1, N (1)
                                        
where  – spectral radius of the matrix W (for large  learning is faster, but long short-term
memory decreases), 0    1 ,
 j – eigenvalues of matrix W .
   4. Assign to the weights wij(1) (n) , i  M ( 0 )  1, M ( 0 )  N (1) , j  1, N (1) , the values of the
                                     
corresponding elements of the matrix W .

7.2. The output layer weights calculation based on the multi‐agent
metaheuristic SAPSO method
  The proposed SAPSO (simulated annealing and particle swarm optimization) method for
numerical functions optimization consists of the following blocks (Figure 3).




                                              1. Initialization




                           2. Modification of the speed of each particle using
                                          simulated annealing




                             3. Modification of the position of each particle



                              4. Determination of the particle of the current
                                    population with the best position




                                 5. Determining the global best position


                                                                                        yes
                                                 6. n