Long‐Term Forecasting Method in the Supply Chain Based on an Artificial Neural Network with Multi‐Agent Metaheuristic Training Eugene Fedorova and Olga Nechyporenkoa a Cherkasy State Technological University, Shevchenko blvd., 460, Cherkasy, 18006, Ukraine Abstract The problem of increasing the efficiency of long-term forecasting in the supply chain is examined. Neural network forecasting methods that are based on reservoir calculations, which increases the forecast accuracy, are proposed. Methods for identifying parameters of forecast models based on the metaheuristics are proposed for the methods mentioned above. These methods were researched on the basis of the data from the logistics company Ekol Ukraine and are intended for intelligent computer-based supply chain management systems. Keywords 1 long-term forecast, supply chain, metaheuristics, reservoir computing, forecast neural network model 1. Introduction These days, domestic and foreign companies are striving to improve and optimize their business processes with the implementation of the Lean Production technology and principles, the uniqueness of which lies in the fact that it is effective for enterprises of various industries at any stage of the supply chain of products or services to the end consumer. [1-3]. The Lean Production concept is dominant in the formation of "perfect" supply chains, which, in the context of globalization and digitalization of the economy based on information and communication technologies, is the most important factor in competitiveness [4]. One of the most important problems in the field of supply chain management is the insufficiently high accuracy of the forecast. This leads to the fact that supply chain management can be ineffective. Therefore, the development of forecasting methods in the supply chain is an urgent task. To date, many approaches are known as long-term forecasting tools, among which are:  autoregressive forecasting methods [5];  forecasting methods based on exponential smoothing [6];  neural network forecasting methods [7-10]. Autoregressive methods have the complex determination of the functional dependencies type, the labor intense determination of the model parameters, low adaptability and the lack of the ability to model nonlinear processes. Neural network forecasting methods provide a tangible advantage, consisting of: the relationships between factors are investigated on ready-made models; no assumptions about the distribution of factors are required; a priori information about factors may be missing; the original data may be highly correlated, incomplete or noisy; analysis of systems with a high degree of nonlinearity is possible; rapid model development; high adaptability; analysis of systems with a large number of CMIS-2021: The Fourth International Workshop on Computer Modeling and Intelligent Systems, April 27, 2021, Zaporizhzhia, Ukraine EMAIL: fedorovee@ukr.net (E. Fedorov); olne@ukr.net (O. Nechyporenko) ORCID: 0000-0003-3841-7373 (E. Fedorov); 0000-0002-3954-3796 (O. Nechyporenko) © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) factors is possible; a complete enumeration of all possible models is not required; analysis of systems with heterogeneous factors is possible. However, neural network methods have a lack of transparency, the complexity of the architecture definition, strict requirements for the training sample, the complexity of the training algorithm choice, and the resource-intensiveness of the training process. Therefore, the task of increasing the efficiency of neural network forecast is urgent. The aim of the work is to develop a method for long-term forecasting in the supply chain. To achieve the goal, the following tasks were set and solved:  analyze existing forecast methods;  propose a neural network forecast model;  choose a criterion for evaluating the effectiveness of a neural network forecast model;  propose a method for determining the values of the neural network forecast model parameters based on multi-agent metaheuristics;  perform numerical studies. 2. Problem statement The problem of increasing the efficiency of long-term forecasting in the supply chain is reduced to the problem of finding such a vector of parameters W , which satisfies the forecast model adequacy 1 P criterion F   P  1 ( f ( x  , W )  d  ) 2  min , i.e. deliver the minimum of the mean squared error (the W difference between the model output and the desired output), where P – test set cardinality, x  .– th training input value, d  .– th training input value. 3. Literature review The most commonly used forecast neural networks are: 1. Long short-term memory (LSTM) [11, 12]; This network is based on gates (FIR filters) and a multilayer perceptron. Instead of each hidden neuron, it uses a memory block that contains one or more cells, and is connected with input, output and forget gates. Gates determine how much information to pass through. If the input and output gates are close to 1 and the forget gate is close to 0, then the network turns into an Elman network. If the input gate is close to 0, then the short-term information from the input is ignored. If the forget gate is close to 0, then long-term information from the memory block is ignored. If the output gate is close to 0, then the output information is ignored. The advantage of this network is a higher forecast accuracy than in a conventional multilayer perceptron. The disadvantages are the complexity of the memory blocks implementation, insufficient forecast accuracy, the complexity of defining the architecture, insufficient learning rate. 2. Gated recurrent unit (GRU) [13-15]; This network is based on gates (FIR filters) and a multilayer perceptron. Instead of each hidden neuron, it uses a hidden block that is connected with reset and update gates. Gates determine how much information to pass through. If the reset gate is close to 1 and the update gate is close to 0, then the network turns into an Elman's network. If the reset gate and update gate are close to 0, then the long-term information from the hidden block is ignored and the network becomes a multilayer perceptron. If the update gate is close to 1, then the short-term information from the network input is ignored. The advantage of this network is a higher forecast accuracy than in a conventional multilayer perceptron. The disadvantages are the complexity of the hidden blocks implementation, insufficient forecast accuracy, the complexity of defining the architecture, insufficient learning rate. 3. Neural Turing machine (NTM) [16, 17]; This network is based on a Turing machine and a multilayer perceptron or LSTM and includes a controller and a memory matrix. At any given time, the controller receives input from the outside world and sends the output to the outside world. The controller also reads from the memory matrix cells via the read heads and writes to the memory matrix cells via the write heads. The advantage of this network is a higher forecast accuracy than in a conventional multilayer perceptron. The disadvantages are the complexity of the controller implementation (in the case of LSTM) and the complexity of defining the architecture, insufficient forecast accuracy, insufficient learning rate. 4. Echo state network (ESN) [18, 19]; This network is based on reservoir computing over sigmoid neurons and a multilayer perceptron. The hidden layer is called the reservoir. Each neuron in the reservoir may be unconnected or connected to other neurons in the reservoir. To train the network, the pseudoinverse matrix method is used. The advantages of this network are the highest forecast accuracy (due to the pseudoinverse matrix method) and the ease of implementation of sigmoid neurons in the reservoir. The disadvantages are the complexity of parallel learning and the complexity of defining the architecture. 5. Long short-term memory (LSM) [20-23]. This network is based on reservoir computations over impulse neurons «Leaky Integrate and Fire» (LIF) and multilayer perceptron. Each neuron in the reservoir may be unconnected or connected to other neurons in the reservoir and is excitatory or inhibitory. A gradient learning method is used to train the network. The advantages of this network are a higher forecast accuracy than in a conventional multilayer perceptron and the possibility of parallel training for the part of the network corresponding to a multilayer perceptron. The disadvantages are the complexity of the implementation of impulse neurons, the complexity of defining the architecture and less high prediction accuracy, the complexity of parallel training for the part of the network corresponding to the reservoir. Usually, the methods listed above either have a low forecast accuracy (due to falling into a local extremum) or a low learning rate (due to the high computational complexity of the hidden neuron or the complexity of parallelization of training) or the complexity of implementation (due to the complexity of the hidden neuron architecture) or the complexity of defining the architecture, which leads to a decrease in forecast efficiency. Due to this, creation of a neural network with a training method and architecture that will eliminate the indicated disadvantages is an urgent task. 4. Block diagram of a neural network model for a long‐term forecast Figures 1-2 show a block diagram of a long-term forecast model based on a fully connected echo state network (FC-ESN), which is a recurrent two-layer neural network. Unlike traditional ESN, this network is fully connected, using cascades of unit delays. FC-ESN type 1 has a unit delay stage in the input layer. FC-ESN type 2 has a unit delay stage in the input and output layers. The number of input and output neurons is 1. Output neurons … Input neurons Hidden t-1 t neurons Figure 1: Block diagram of a long‐term forecast model based on a fully connected echo state network with a cascade of unit delays for an input layer neuron (FC‐ESN type 1) Output neurons … Input neurons Hidden t-1 t neurons Figure 2: Block diagram of a long‐term forecast model based on a fully connected echo‐state network with a cascade of unit delays for a neuron of the input and output layers (FC‐ESN type 2) 5. Neural network models for long‐term forecast 5.1. Long‐term forecast model FC‐ESN type 1 1. Initialization n 1. y i(1) ( n  1)  0 , i  1, N (1) . 2. Forecast 2.1. Initialization of the outputs of the neurons of the input layer y i( 0) ( n)  xi , 2.2. Calculation of the outputs of the neurons of the hidden layer y (j1) (n)  f (1) ( s (j1) (n)) , j  1, N (1) , M (0) M ( 0 )  N (1 ) s (j1) (n)  b (j1) (n)   i 0 wij(1) y ( 0 ) (n  i)  w y (0) (1) ij (1) i M ( 0) (n  1)  iM 1 (1)  wM (0)  N (1) 1 y ( 2 ) (n  1) , 2.3. Calculation of the outputs of the neurons of the output layer y ( 2) (n)  f ( 2) ( s ( 2) (n)) , M ( 0) M ( 0 )  N (1 ) s ( 2) (n)  b (2) ( n)   i 0 wi( 2 ) y ( 0) (n  i)  w y(0) ( 2) i (1) iM ( 0) ( n) . iM 1 (1) where N – the number of neurons in the first layer, M ( k ) – the number of unit delays for the kth layer, wij( k ) – the connection weight from the ith neuron to the jth neuron on the kth layer, b (k j ) – displacement (thresholds) on the kth layer, y (jk ) (n) – the output of the jth neuron on the kth layer at time n, f ( k ) – neurons activation function on the kth layer (usually f ( k ) ( s )  tanh( s ) ). 5.2. Long‐term forecast model FC‐ESN type 2 1. Initialization n 1. y i(1) ( n  1)  0 , i  1, N (1) . 2. Forecast 2.1. Initialization of the outputs of the neurons of the input layer y i( 0) ( n)  xi , 2.2. Calculation of the outputs of the neurons of the hidden layer y (j1) (n)  f (1) ( s (j1) (n)) , j  1, N (1) , M (0) M ( 0 )  N (1 ) s (j1) (n)  b (j1) (n)   i 0 wij(1) y ( 0 ) (n  i)  w y (0) (1) ij (1) i  M ( 0) (n  1)  iM 1 M ( 0 )  N ( 1)  M ( 2 )   wij(1) y ( 2) (n  (i  ( M ( 0)  N (1) )) , i  M  N (1) 1 (0) 2.3. Calculation of the outputs of the neurons of the output layer y ( 2) (n)  f ( 2) ( s ( 2) (n)) , M ( 0) M ( 0 )  N (1 ) s ( 2 ) ( n )  b ( 2 ) ( n)  i 0 wi( 2) y ( 0) (n  i)  w y ( 2) i (1) iM ( 0) ( n) . i  M ( 0 ) 1 6. Criterion for evaluating the effectiveness of a neural network model for long‐term forecast In this work, to determine the parameters values of the FC-ESN model, the criterion of the model adequacy was chosen, which means the choice of such values of the parameters W  {wij(1) , wi( 2 ) } , which deliver the minimum of the mean squared error (the difference between the model output and the desired output): 1 P F P  1  ( y ( 2)  d  ) 2  min , W (1) where P – the test set cardinality. 7. Method for determining the parameters values of the neural network model for long‐term forecast The method for determining the parameters values of the neural network model for long-term forecasting is reduced to calculating the weights of the hidden layer and the output layer of the FC- ESN model. 7.1. Calculating the weights of the hidden layer The weights of the hidden layer are calculated as follows: 1. Initialize randomly biases (thresholds) b (j1) and weights wij(1) . 2. Make up from weights wij(1) , i  M ( 0 )  1, M ( 0 )  N (1) , j  1, N (1) , matrix W  [ wij ] , i, j  1, N (1) .   W 3. Determine the matrix W as W   , max {|  j |} j1, N (1)  where  – spectral radius of the matrix W (for large  learning is faster, but long short-term memory decreases), 0    1 ,  j – eigenvalues of matrix W . 4. Assign to the weights wij(1) (n) , i  M ( 0 )  1, M ( 0 )  N (1) , j  1, N (1) , the values of the  corresponding elements of the matrix W . 7.2. The output layer weights calculation based on the multi‐agent metaheuristic SAPSO method The proposed SAPSO (simulated annealing and particle swarm optimization) method for numerical functions optimization consists of the following blocks (Figure 3). 1. Initialization 2. Modification of the speed of each particle using simulated annealing 3. Modification of the position of each particle 4. Determination of the particle of the current population with the best position 5. Determining the global best position yes 6. n