Introduction to the problem of collective decision making in conditions of uncertainty

Neuroagent Game Model of Collective Decision Making in Conditions of Uncertainty

Information Systems

Networks Department

Lviv Polytechnic National University

Lviv Ukraine Petro.O.Kravets@lpnu.ua

vpasichnyk@gmail.com

antonii.v.rzheuskyi@lpnu.ua

0 Computer Systems and Technologies Department, PHEI “Bukovinian University” , Chernivtsi , Ukraine

The neuroagent game model of collective decision making in conditions of stochastic uncertainty is proposed. Neuroagents are based on artificial neural networks with feedback and learning without a teacher. The current collective decision is obtained after the independent choice of their own decision by all players. Each player generates a current version of decision according to values of the neural network outputs. The choice of decisions is carried out by neuroagents in a random way, regardless of time and other agents. Random choice involves calculation of the probabilities of choosing decisions by optimal design of neural network outputs for a single simplex. After choosing a collective decision, the reaction of decision making environment is determined as a set of values of current neuroagents' gains. The current gain of each neuroagent is transmitted to the inputs of the appropriate two-layer neural network. Then, the training of neuroagents is carried out by changing the weights of neuronal relationships by one of the learning algorithms without a teacher. The learning process is repeated until the weights of the neurons relationships are stabilized with a given accuracy. The course of training is aimed at maximizing the average gains of neuroagents. The decision of the game is achieved in one of the points of the collective optimum or equilibrium, depending on the values of the parameters of the chosen method of neuroagents training.

Collective Decision Making Uncertainty Conditions Neuroagent Stochastic Game Adaptive Learning Methods

Introduction to the problem of collective decision making in conditions of uncertainty

To solve the tasks of distributed management and decision making in technical, economic, informational and social systems, there is a need for a collective choice of decisions that satisfy one of the conditions of multicriteria optimality, such as Nash, Pareto, etc. [ 1-7 ]. These conditions in one way or another determine the degree of benefit and fairness of the collectively achieved decision.

The formal model of the system of collective choice of decisions is determined by managed environment and decision making managing agents. The environment model is determined by the structure and functions of the managed system.

The environment perceives agents’ managing decisions and generates output signals which are interpreted by agents as estimates of efficiency of realized decisions. Agents get estimations from the environment, choose and implement decisions to maximize their own win or minimize losses. Such an approach to building a decision making system is called optimization [ 7 ]. In contrast to the situational approach, when the decision is obtained immediately on the basis of input data, in an optimization approach, the decision is refined through the feedback link.

An agent is an intelligent decision making system, software or technical implementation of the model of decision maker. Generally, the agent consists of the following main subsystems: receptor, intellectual and effector. The receptor subsystem receives signals from the decision making environment, intellectual makes decisions based on current information, self-study and forecasting, effector provides the implementation of decisions made with appropriate impact on the environment.

Collective decision making in a distributed environment is conducted by a set of interacting intelligent agents, which is called the multiagent system (MAS) [8-12]. Each agent makes a decision based on available local information and interaction within the MAS. As an active element of MAS, the agent is able to perceive and analyze data of the information network, negotiate and exchange current information with other agents, make stand-alone decisions, change the states of the network, inform the system and the user about the results of their actions.

The intelligent agents have the following basic properties: 1) autonomy – in pursuit of the goal, the agent can independently make decisions based on his own knowledge, without requiring external management; 2) reactivity – the agent can perceive external information, produce and implement adequate actions;

3) intelligence – the agent's ability to learn, adapt to environment changes, to process data obtained by artificial intelligence methods, to make optimal decisions; 4) specialization – as a rule, the agents perform highly specialized functions; 5) mobility – to achieve the goal, the software agent can move within the information network;

6) coordination – distribution of roles and coordination of actions of agents in solving common tasks;

7) interactivity – the agent interacts with other agents, with the information resources of the network and with the user;

8) communication – the ability of agents to communicate in technical language and understand each other;

9) personality – the unique qualities of the agent, modeling his psychological features, current emotions, etc.;

10) decentralization – each of the agents doesn’t have a complete imagination of the whole system and, therefore, there are no agents who manage the whole system. When solving a task between agents there are phenomena of cooperation or competition. As it is known, these types of interaction between agents under conditions of uncertainty are studied by the theory of stochastic games [13–15]. Therefore, to study the collective behavior of agents in the decision making process, it is advisable to use the mathematical apparatus of the theory of stochastic games.

The typical features of the game decision making in the MAS are: 1) distribution or multiparameter of the decision making environment; 2) internal stochasticity of the environment; 3) complete or partial absence of a priori information (uncertainty) about the decision making environment;

4) manageability of the environment and the possibility of distributed implementation of management options; 5) multicriteria of management or decision making; 6) discreteness and finiteness of the set of decision making options; 7) stochastic independence of choice of decisions in space and time; 8) the possibility of multiple repetition of variants of decisions implementations in time;

9) distributed locally-dependent character of the formation and collection of information for statistical identification of the decision making environment; 10) the possibility of using a distributed game algorithm, which ensures the achievement of compromise decisions area; 11) implementation of the game algorithm in real time; 12) determination of the moments of stopping the game algorithm for the possibility of its practical application.

The functioning of the MAS decision making is conducted in conditions of a priori uncertainty [16]. Uncertainty can be caused by internal or external MAS factors. We distinguish the following types of uncertainties:

1) structural – unknown composition of the system and connections between its elements; 2) algorithmic – unknown algorithm of the system functioning; 3) information-fuzziness, lack of complete information necessary for decision making;

4) linguistic – ambiguity of statements in the exchange of messages between agents; 5) target – an unknown global purpose of the system; 6) social – due to the collective interaction of agents, when the actions of one of the agents influence the choice of decisions by other agents;

7) stochastic – the influence on the system of uncontrolled external factors. In scientific research, the uncertainty of decision making is most often modeled with the use of mechanism of random variables, which is the basis of stochastic uncertainty. Partial compensation of uncertainty is provided by the ability of agents to self-learning and adaptive decision making strategies.

In the cybernetic literature, game methods of self-learning based on the adaptive formation of probability distributions of discrete variants of decisions (pure strategies of players) are well studied [14]. After chosing and implementation of pure strategies by all players is completed, each of them receives a current win from the environment, which is used to change the dynamic mixed strategies (conditional probabilities vectors of the choice of decisions), which are the basis for the mechanism of generating random pure strategies. This change means that the probability of choosing a clean strategy increases proportionally to the value of the current win. The method of reconstructing mixed strategies with the passage of time built on the basis of stochastic approximation provides maximization of the functions of average gains on unit simplexes. The intellectual capabilities of agents of such game are limited, because they model only the reflex behavior of biological systems.

Artificial neural networks (ANN) can be used to construct intellectual decision making systems as models of processes of the nervous activity of biological systems with the ability to memorize, analyze and predict behavior [17–20]. Neural networks implement "soft" calculations based on the processes occurring in the human brain and are used as models of objects with unknown characteristics. ANN consist of many neurons and the connections among them. Training of ANN is to correct the synaptic connections among neurons based on information which enters the neural network from the environment. To obtain the necessary structure of connections, some connections among neurons are amplified while others are weakened.

The decision making system, based on ANN, is called a neuroagent. Neuroagent game models of decision making in conditions of uncertainty are insufficiently studied in modern professional scientific literature.

The application of neuroagent game models is a promising direction for increasing the efficiency of collective development and decision making processes in conditions of uncertainty due to the following features:

1) nonlinearity – neural networks allow to obtain a nonlinear dependence of the output signal from the input;

2) adaptability – neural networks have the ability to adapt their synaptic weights to environmental changes;

3) plasticity and resistance to failures – neural networks store information in distributed form over all connections of the neural network. Failure of one or more neurons does not lead to failure of the system as a whole;

4) universality – neural networks do not require special programming, because they allow to solve various tasks of information processing with the same algorithms of neurons training.

The aim of this work is to develop a model of stochastic game of neuroagents for collective decision making in conditions of uncertainty. To achieve this aim it is necessary: to set the task of game decision making in conditions of uncertainty; to set the environment for collective decision making; develop the structure of neuroagents; to choose the method of teaching neuroagents to solve the formulated task; to develop algorithm and software tools for simulation of stochastic game of neuroagents; to analyze the obtained results and make recommendations for their practical use. 2

Setting up of the Game Task of Decision Making In Conditions Of Uncertainty

Matrix stochastic game = ( , { , [ ]) of decision making is given by: } 1) multitude of agents I  {i |i 1..L} , where L | I | is a power of multitude or number of players;

2) set of pure strategies of agents Ui  {ui (1), ui (2),..., ui ( Ni )} , where Ni is quantity of pure player’s strategies with number i ;

3) matrices of the average gains of agents vi (u) i  I , u U , where u U  iI Ui is collective player’s strategies.

The stochastic game takes place in discrete moments of time t  1, 2,... . After the implementation of the collective strategy ut  u each agent gets a current random gain i,t (u)  R1 with unknown mathematical expectation M{i,t (u)}  vi (u) and limited dispersion di (u)   . Here and below the index i is the number of the player, and the index t is the current time.

The obtained current agent’s gains are averaged over time to evaluate the effectiveness of the decision making process by each agent:

t i,t ({ut })  t 1 i, i  I . (1)  1 The aim of the agents is to maximize their average payment functions: lim i,t  max i  I . t ui (2) Thus, based on observations of current gains  i,t each agent i  I must choose current decisions ui,t  ui Ui so that with the passage of time t  1, 2,... to ensure maximization of system of target functions (1).

The decision of the multicriterial problem (2) should be sought in the set of points of collective equilibrium (for example, Nash) or optimality (for example, Pareto) depending on the way of agents' choice the sequences of decisions variants.

Game Neuroagent Method of Task Solution

Known adaptive sequence generation methods {ui,t } i  I , t  1, 2,... based on dynamic distributions of discrete random variables, which use mixed players’ strategies [14]. In contrast, let’s consider the neuroagent method for solving a matrix stochastic game, the scheme of which is shown in Fig. 1

The model of the decision making environment is given by the matrices of the mathematical expectations of random gains vi i  I . The values of the decision variants are presented to the input of the environment ui Ui . The output of the environment is the corresponding values of the current gains  i (u) .

Each neuroagent is assigned an artificial network with n  2 layers of neurons. The number of elements of each layer is the same, equal to the number of decisions Ni | Ui | . The vectors of the parameters of the neuroagents are presented to the inputs x(n1) , calculated on the basis of outputs of the environment  i (u) i  I .

i The outputs of neuroagents are vectors of the parameters yi(n) , on the basis of which decision variants are determined ui Ui i  I . Weights wi(n1) indicate the value of synaptic connections among neurons of i -th agent. The positive values of weights of connections correspond to exciting and negative values correspond to inhibitory synapses. Zero value of weights means the absence of connection among the neurons.

The functioning of a neuroagent is carried out by one of the adaptive algorithms of unsupervised learning, for example, Hebb, Kohonen or others [19]. Unsupervised learning or self-learning is by nature closest to the brain as its biological prototype. Self-learning is not oriented to the presence of correct outputs of neural network. The self-learning algorithm independently detects the internal structure of input data, rebuilding the weights of synaptic connections so that close (by some metric) sets of input signals cause sufficiently close output sets of signals. In fact, the process of neural network self-learning solves the task of data clustering, identifying the statistical properties of learning multitudes and grouping similar output multitudes into clusters. By entering a vector from given class to the input of trained neural network, we obtain the characteristic output vector for this class. The output vector is not known in advance. Its formation is due to the structure of the training sample, random distribution of initial values of weights of connections among neurons and a combination of excited neurons of the output layer of the neural network.

Neuroagents carry out a random choice of decisions ui,t Ui independently of each other i  I and in time t  1, 2,... . To do this, each neuroagent builds a vector of conditional probabilities pi,t choice of decisions ui,t by designing the output vector yi(,nt) on a single Ni -measurable  -simplex: pi,t (ui,t | ui, ,i, ,  1, 2,...,t 1)   Nt i {yi(,nt)} i  I , (3) where  Nt i is a projector on a single  -simplex SNt i  S Ni [14]. Parameter  t adjusts the speed of expansion of  -simplex SNt i to a single simplex S Ni and can be used as an additional factor in controlling of convergence of game neuroagent decision making method. The obtained probability vector is used to construct an empirical distribution of discrete random variables, on the basis of which a choice of decisions is made: ui,t  ui [k ] k  arg  min k pi,t (ui,t [ j])    , k  1.. Ni  i  I ,   k j1   where  [ 0, 1 ] is a real random number with uniform distribution.

The response of decision making environment to the chosen version is the value of a random variable with an unknown distribution Z , which is interpreted as the agent's current gain:

 i (ut ) ~ Z (vi (ut ), di (ut )) , where vi (ut ) is mathematical expectation, di (ut ) is dispersion.

The obtained current gains  i,t (ut ) are submitted to the inputs of neuroagents x(n1)  e i (ut ) i  I , i,t where e  1| ui Ui  is the vector, all elements of which are equal to one. If necessary, normalization of vector elements is carried out x(n1) , for example, such i,t as: (4) (5) x(n1)  ei (ut ) / | max | i  I , i,t where  max is the maximum value of current gains. Normalization can reduce the number of steps required to train the neuroagent.

Total inputs xi(,tn) of neurons of n -th layer are calculated based on outputs yi(,nt1) of neurons of (n 1) -th layer:

Ni xi(,tn)[k ]   wi(,nt1)[ j, k ]yi(,nt1)[ j] i  I , k  1..Ni ,

j1 where wi(,nt1)[Ni , Ni ] is the matrix of the weights of connections among the nodes of the neural network, calculated at the moment of time t . Here wi(,nt1)[ j, k] indicates the weight of connection between j -th node (n 1) -th layer and k -th node of n -th layer. To calculate outputs yi(,nt) of neuroagent a transfer function  ( ) of neuron is used:

yi(,nt)[k]  (xi(,tn)[k]) , where k  1..Ni .

Depending on the task being solved and the type of neural network, the transfer function  ( ) can be threshold, linear with saturation, sigmoidal, sinusoidal, radially symmetrical, and so on. Most often for modeling an artificial neural network, linear (6) (7) (8) (9) 0, if xi(,tn)[k ]  , yi(,nt)[k ]  

 (xi(,tn)[k ]  ), if xi(,tn)[k]  bipolar sigmoidal

yi(,nt)[k]  0.5  1/(1  exp( (xi(,tn)[k]  ))) transfer functions are used. Parameter   0 defines the tangent of the slope angle for the linear transfer function and the level of steepness for the sigmoidal transfer function. Parameter   0 defines the threshold of neuron activation.

Teaching the neuroagent is carried out by changing the weights wi(,nt1) of synaptic connections among neurons. Recalculation of the weights of connections is performed according to Hebb’s signal method, Kohonen’s method or another method of unsupervised learning.

Learning by Hebb’s method leads to increase of relationships among excited neurons: wi(,nt1)[ j, k]  wi(,nt11)[ j, k]   t ( yi(,nt11)[ j]* yi(,nt11)[k]) , j  1..Ni , k  1..Ni , (10) where  t is the parameter of neuroagent training step.

Excited are called neurons for which the value of the total input xi(n) exceeds the activation threshold  .

Training by Hebb’s differential method leads to increasing of connections among those neurons, the outputs of which have changed the most: wi(,nt1)[ j, k]  wi(,nt11)[ j, k]   t ( yi(,nt1)[ j]  yi(,nt11)[ j]) * ( yi(,nt1)[k]  yi(,nt11)[k]) , j  1..Ni , k  1..Ni .

Training by Kohonen’s method is based on the mechanism of competition, the essence of which is to minimize the difference between the input signals of a neuronwinner, coming from the outputs of the neurons of the previous layer, and the weight coefficients of its synapses: wi(,nt1)[ j, ki* ]  wi(,nt11)[ j, ki* ]   t ( yi(,nt11)[ j]  wi(,nt11)[ j, ki* ]) , j  1..Ni , where ki* is index of the neuron-winner of i -th agent.

In contrast to Hebb’s method, in which simultaneously several neurons of the same layer can be excited, in Kohonen’s method, neurons of the same layer compete with each other for the right of activation. This rule is known in literature on machine learning as "winner takes it all".

According to Kohonen’s method the restructuring of weights of connections is carried out only for neuron-winner. The winner is the neuron whose synapse values are as similar as possible to the input image.

The neuron-winner is determined by calculating the distance between the vectors yi(,nt)1 and wi(,nt11) :

Di,t1[k ] 

Ni 2  yi(,nt11)[ j]  wi(,nt11)[ j, k] , j  1..Ni .

j1 The winner is the neuron with the smallest distance: ki*  index min (Di,t1[k]) .

k1..Ni Another way to determine the neuron-winner is to maximize the outputs of yi(,nt)1 neurons of n -th layer: ui,t1   u[k ] k  arg max yi(,nt)1[ j] .

  j1..Ni  In this case, the index of the neuron-winner is a serial number of the chosen variant of decision ui,t1 : ki*  index(ui[k] |  (ui[k]  ui,t1), k  1..Ni ) i  I ,  ( ) {0,1} is the indicator function of event. (11) (12) (13)

The training radius can be set around the neuron-winner R in space of the vectors of weights of neurons: ri,t [k ]  wi(,nt1)[ki* ]  w(n1)[k ] , k  1..Ni , i,t where wi(,nt1)[ki* ] is the vector of weights of the neuron-winner;  is Euclidean vector norm.

Each neuron whose distance from the vector of weights to the vector of weights of the neuron-winner is less than the radius of training ( ri,t [k ]  R ), takes part in the calculation of weights of synapses. The weights of neurons that are outside the training radius do not change. The training radius decreases in time so that at the end of the training process correction of the weights of connections can be carried out only by one neuron-winner.

The parameters  t у (10) – (12) and  t у (3) determine the learning rate of neuroagents.To ensure the convergence of neuroagents training process, these parameters are set as positive monotonously decreasing values: (14) (16)  t   0 / t ,  t   0 / t , where  0 ,  0 ;  0 ,  0 .

The choice of decision options continues to a specified number of steps t  tmax , or to fulfillment the condition of the training accuracy:  t | I |1  wi(,nt1)  wi(,nt11)   i  I , (15) iI where  is the neuroagents training accuracy, which is determined by Euclidean norm of change the weights of connections among neurons for two consecutive moments of time. 4

Indicators of Stochastic Game Effectiveness

For practical applications, it is necessary to determine the indicators of the game effectiveness, which can be used to evaluate the convergence of the game method. In the absence of direct information exchange between players, such indicators are formed on the basis of the collective equilibrium condition according to Nash [ 1, 14, 21 ]. To do this, let’s define the functions of the average players’ gains: Vi ( p)   vi (u)  uU jI ;uju

p j (u j ) , where p  S I , S I   S N j , vi (u)  M{i,t (u)} , and the values p j  S N j are jI determined according to (3).

The Nash equilibrium determines the following strategies for solving the game, for which the condition is performed: i  I Vi ( p* )  Vi ( pI \i*, pi )  0 , (17) where p*  S I is the optimal collective mixed players’ strategy; Vi ( pI \i*, pi ) is the function of average gains, defined on the simplex S I at random deviation of the mixed strategy of the i -th player from the equilibrium point by Nash within the unit simplex. The optimal by Nash leveling mixed strategies can be obtained from the condition of complementary non-rigidity [21]:  piVi  VieNi i  I , (18) (19) (20) where pi  S Ni is the mixed strategy of the i -th player;  piVi is the gradient of polylinear function of average gains (16); eNi  (1j | j  1..Ni ) is a vector, all elements of which are equal to 1.

To account for solutions on the boundary of a simple simplex, let’s perform the weighing of the vector condition of complementary non-rigidity with the elements of the vectors of mixed strategies: diag( pi )( piVi ( p)  eNiVi ( p))  0 i  I , where diag( pi ) is the square diagonal matrix of N i order, constructed from elements of the vector pi ; p  S I is combined mixed players’ strategies, set on convex simplexes of S I .

Let’s define the Lyapunov’s function as the total current players’ error during the search for a point of equilibrium according to Nash: t | I |1  i,t ,

iI 2 where i,t  pi,t  p%i,t , pi,t , p%i,t  S Ni . Values t  0 t  1, 2,... turns to zero at the points of equilibrium according to Nash, which can be achieved both inside and on the vertices of single simplexes. The vectors pi,t are determined according to (3), and p%i,t is calculated so:

p%i,t ( j)  Vi,t ( j) Vi,t , j  1..Ni , where Vi,t ( j)  pi,t ( j)Vi,t ( j) .

Direction t to zero at t  1, 2,... will indicate the convergence of the game method to one of the equilibrium points according to Nash in mixed strategies or taking into account (19) the achievement of the game solution in pure strategies.

The order  and value  of the convergence rate of the game method can be evaluated using the asymptotic method of Jung's moments [22]: That is the parameter  can be defined as an angle tangent  of slope of the linear approximation of the function M t  in a logarithmic coordinate system. The course of a stochastic non-antagonistic game can be traced also by changing the function of average player’s gains: t | I |1  i,t .

iI (23) 5

Kohonen’s Algorithm of Neuroagent Functioning

1. Set the initial parameters values: t  0 is the initial time point; I is a multitude of players; Ni i  I is a number of decisions options; Ui  {ui [ 1 ],ui [ 2 ],...,ui [N ]} i  I is a multitude of decisions options; [vi ] i  I is the matriсes of mathematical expectations of gains; [di ] i  I is the matrices of dispersions of gains; wi(,n01)[Ni , Ni ] i  I is the matrix of initial weights of connections among nodes of the neural network;  ,  is parameters of the transfer function of the neurons;  0 is parameter of training step;  (0,1] is the order of training step;  0 is parameter of  -simplex;  is the order of the speed of expansion of  -simplex; tmax is the maximum number of steps of the method;  is accuracy of training. 2. Perform choices of decision variants ui,t i  I according to (3) – (4). 3. Obtain the value of current gains of neuroagents as random variables with normal distribution  i (ut ) ~ Normal(vi (ut ), di (ut )) i  I . corresponding outputs of yi(,nt1) i  I neurons (n 1) -th layer according to (7). i  I (6) and the corresponding 5. Calculate the total inputs of neurons x (n) i,t outputs yi(,nt) i  I (7) for neurons of n -th layer. 6. Calculate parameter value of  t according to (14). 7. Define ki* indexes of the winning neurons i  I according to (13). 8. Calculate the weight of connections to neurons winners wi(,nt1)[ j, ki* ] i  I , j  1..Ni according to (12). t (20),  (22). 9. Calculate the characteristics of the quality of collective decision making t (23), 10. Set the next point in time t : t  1 . 11. If the condition (15) of the end of the game is not met, then go to step 2, otherwise – the end. 6

Results of Computer Modeling

Let's solve the stochastic game of two neuroagents | I | 2 with two clean strategies Ni  2 , i  1..2 . Average gains matrices [vi ]22 i  1..2 such a game is given in Table. 1.

Strategies

 p1 (u1[ 1 ]) p1(u1[ 2 ]) The view of slices of functions of average gains of neuroagents (17) with values on the unit simplex, corresponding to Table 1, is shown in Fig. 2.

Fig. 2. Functions of average gains of neuroagents Analysis of average win functions shows that the game has one Nash solution in mixed ( p1[ 1 ], p2[ 1 ])  (0.5, 0.2) and two solutions in pure strategies ( p1[ 1 ], p2[ 1 ])  (0,1) , ( p1[ 1 ], p2[ 1 ])  (1, 0) .

Under conditions of uncertainty, the elements of matrices of average winnings vi (u)uU are unknown and available for observation in the form of random current values  i (u) . To simulate random gains we choose the normal distribution law:  i (u)  Normal(vi (u), di (u)) , where vi (u) is a mathematical expectation; di (u) is a dispersion. Normallydistributed random variables are calculated using the sum of uniformly distributed random real numbers  [ 0,1 ] : i (ut )  vi (ut ) 

 12  di (ut )  i,t [ j]  6  .

 j1  Initial values of link weights w(n1) between neurons are random variables uniformly 0 distributed in the interval [ 0, 1 ]. The linear dependence of the output on the total inputs is chosen as the transfer function of neurons (8).

The convergence of the gaming neuroagent method is determined by the exact ratio of parameters, which in the general case must satisfy the basic conditions of stochastic optimization [22–28]. The parameters of the neuroagent training method take the following values:   1;   0,1; Ni  N  2 ;   0.999 / N ;   1 .

The graphs of functions of average gains are shown in logarithmic scale in Fig. 3 i,t and the error of achieving an optimal collective solution at one of the equilibrium points i,t according to Nash. Increase i,t and reduction i,t in time indicate the convergence of the neuroagent method of decision-making in the sense of fulfilling the condition of complementary rigidity (19).

The trajectories of change of conditional probabilities of collective choice of variants of decisions within a single simplex are shown in Fig. 2. From the obtained data it can be seen that for the given parameter values, the game neuroagent method (12) provides the solution of stochastic game on top of a single simplex ( p1[ 1 ], p2[ 1 ])  (0,1) and has close to1 order   tg( ) of convergence speed.

For the convergence of the stochastic game of neuroagents to Nash equilibrium points with probability 1 or in medium-quadratic, it is necessary that the ratio of parameters of the recurrent game method primarily satisfies the fundamental conditions of stochastic approximation [29, 30]. In conditions of uncertainty the theoretical study of convergence is based on the stochastic Robins-Monroe approximation, the results of the Robin-Sigmund’s lemma, and depends on both the parameters of the environment and the values of the parameters of the game method [31].

Recurrent game algorithms provide the order degree of convergence rate convergence rate [31] and are easy to program. The low rate of convergence is due to the lack of a priori information about players' payment matrices and the lack of direct exchange of information between players. In addition, each player does not have information about the structure of the game (the number of players, the number of strategies, dependence of own payments on the strategies of other players). During the multi-step game, each player learns to choose pure strategies so as to optimize own function of average payments.

The learning process is a reorganization of one's own mixed strategies over time – by increasing the probability of choosing pure strategies that, on average, produce the best results.

Theoretical evaluations of the conditions of convergence of game algorithms in conditions of uncertainty are complex and difficult to analyze in a rigorous analytical way. In addition, such convergence conditions are evaluations from above and therefore, there is a question about their accuracy. In this regard, the theoretical results will not always be satisfactory and require experimental clarification. It is experimentally established that scaling the system in the direction of increasing the number of players and the number of pure strategies leads to an increase in the entropy of a stochastic game, and as a result is a slowing down of its rate of convergence.

Let's study the dependence of the learning time of stochastic game of neuroagents on the basic parameters of the algorithm. We define training time as the minimum number of steps required to train neuroagents with a given accuracy   0 : tout  (t  tmin | t   ) , where the current training accuracy  t is calculated according to (15).

Due to the random choice of solutions, we need to average the training time of neuroagents for different sequences of random variables: where kexp is a number of experiments.

The average number of training steps t depends on the parameters of the training algorithm of neuroagents and the parameters of the decision-making environment.

Graph of average time dependence t of neuroagents training from the parameter  is shown in Fig. 4. The results obtained for dispersion value of current gains d  0.01 . The parameter  (0,1] determines the order of monotonic decrease of the value  t  0 (14), which regulates the speed of training of neuroagents. As the value increases, the t value decreases. The accuracy of neuroagents training of game is   103 . The data are averaged by kexp  100 experiments. In all experiments, the choice of pure decision strategies for which condition (19) is met is provided with high probability.

As can be seen from the simulation results, the increase of  parameter, leads to a decrease in the average number of training steps t of neuroagent training with accuracy  .

The dependence of the average number of steps t of training neuroagents on the dispersion d of the stochastic environment (dispersion of evaluations of variants of solutions) is presented in Fig. 5. The results are obtained for the value of order of step of the neuroagent training   0.5 .

As dispersion increases, the average number of steps required for training neuroagents increases. A significant increase in dispersion may lead to an incorrect determination of the optimal decision. Modeling of stochastic game with other matrices of average gains of neuroagents can show a different result, preserving the obtained dependencies among the parameters, required to optimize the payment functions of the players.

Conclusions

The neuroagent game model of collective decision making in conditions of stochastic uncertainty is proposed in this article. Neuroagents are based on artificial neural networks with feedback and learning without a teacher. The current collective decision is obtained after the independent choice of their own decision by all players. Each player generates a current version of decision according to values of the neural network outputs. The choice of decisions is carried out by neuroagents in a random way, regardless of time and other agents. Random choice involves calculation of the probabilities of choosing decisions by optimal design of neural network outputs for a single simplex. After choosing a collective decision, the reaction of decision making environment is determined as a set of values of current neuroagents’ gains. The current gain of each neuroagent is transmitted to the inputs of the appropriate two-layer neural network. Then, the training of neuroagents is carried out by changing the weights of neuronal relationships by one of the learning algorithms without a teacher. The learning process is repeated until the weights of the neurons relationships are stabilized with a given accuracy. The course of training is aimed at maximizing the average gains of neuroagents. The decision of the game is achieved in one of the points of the collective optimum or equilibrium, depending on the values of the parameters of the chosen method of neuroagents training.

The developed software model confirms the convergence of the game neuroagent method (12) of decision making. The efficiency of the method is estimated by means of characteristic functions of average gains and errors of collective choice of the optimal decision making variant. The convergence of the neuroagent game method depends on the number of players, decisions, and relationships among the method parameters and decision making environment parameters.

The reliability of the obtained results has been confirmed by the repetition of the values of the calculated characteristics of the game neuroagent decision making method for different sequences of random variables.

The results of this work can be used to construct distributed control and decision making systems in conditions of uncertainty.

The conducted researches can be continued in the direction of applying of another configuration of neuroagents and other methods of their training, information exchange between agents of the stochastic game, growth of the number of players and the number of their pure strategies, definition of theoretical conditions of convergence of the game neuroagent method. 8. Consensus of Fractional-Order Multiagent System via Sampled-Data EventTriggered Control, https://www.sciencedirect.com/science/article/abs/pii/S0016003218301595, last accessed 2020/04/12. 9. Tauberian Theorems For General Iterations of Operators: Applications to ZeroSum Stochastic Games, https://www.sciencedirect.com/science/article/abs/pii/S0899825618300204?via% 3Dihub, last accessed 2020/04/12. 10. Evolution of cooperation in stochastic games, https://www.semanticscholar.org/paper/Evolution-of-cooperation-in-stochasticgames-Hilbe-%C5%A0imsa/ff49474432459bf295a7d2a775b1719c4fa09755, last accessed 2020/04/12. 11. Zhan, Bu, Guangliang, Gao, Hui-Jia, Li, Jie Cao: CAMAS: A cluster-aware multiagent system for attributed graph clustering. Information Fusion 37, 10–21 (2017). 12. Chaoxu, Mu, Qian, Zhao, Zhongke, Gao, Changyin, Sun: Q-learning solution for optimal consensus control of discrete-time multiagent systems using reinforcement learning. Journal of the Franklin Institute 356 (13), 6946–6967 (2019). 13. Commutative Stochastic Games https://pubsonline.informs.org/doi/10.1287/moor.2014.0676, last accessed 2020/04/12. 14. Semi-algebraic Tools for Stochastic Games 2015, https://www.semanticscholar.org/paper/Semi-algebraic-Tools-for-StochasticGames-Frederiksen/97d4bc8b3800fd14215f03aaddb6f4e3570255cd, last accessed 2020/04/12. 15. A Stochastic Game Framework for Analyzing Computational Investment Strategies in Distributed Computing with Application to Blockchain Mining Mathematics, https://www.semanticscholar.org/paper/A-Stochastic-Game-Framework-forAnalyzing-in-with-DhamalChahed/0702f46e00dc3171ed0b38fdc827b21b120423ee, last accessed 2020/04/12. 16. CSI Neural Network: Using Side-channels to Recover Your Artificial Neural Network Information https://www.semanticscholar.org/paper/CSI-NeuralNetwork%3A-Using-Side-channels-to-Recover-BatinaBhasin/905ad646e5745afe6a3b02617cd8452655232c0d, last accessed 2020/04/12. 17. Many Body Physics: Solving the Quantum Many Body Problem with Artificial Neural Networks, https://www.semanticscholar.org/paper/MANY%E2%80%90BODYPHYSICS%3A-Solving-the-quantum-many%E2%80%90body-CarleoTroyer/e4a85af3f5dc41e13dc2cae9ee851953709b764e, last accessed 2020/04/12. 18. Integration of New Evolutionary Approach with Artificial Neural Network for Solving Short Term Load Forecast Problem, https://www.semanticscholar.org/paper/Integration-of-new-evolutionaryapproach-with-for-SinghDwivedi/0a5e2c346f61d9f68f323bc9946dee6988f72687, last accessed 2020/04/12. 19. State-of-the-art in Artificial Neural Network Applications: A survey 2018, https://www.semanticscholar.org/paper/State-of-the-art-in-artificial-neuralnetwork-A-Abiodun-Jantan/efdb2aa8d8dadc182b139623911157fa158648ad, last accessed 2020/04/12. 20. An Artificial Neural Network as a Troubled-Cell Indicator, https://www.semanticscholar.org/paper/An-artificial-neural-network-as-atroubled-cell-Ray-Hesthaven/2f8af28a213d614ec3b7f7d3f407095d838c944c, last accessed 2020/04/12. 21. Particle Filtering Methods for Stochastic Optimization with Application to LargeScale Empirical Risk Minimization, https://www.sciencedirect.com/science/article/abs/pii/S0950705120300083, last accessed 2020/04/12. 22. DSCTool: A Web-Service-Based Framework for Statistical Comparison of Stochastic Optimization Algorithms,, https://www.sciencedirect.com/science/article/pii/S1568494619307586, last accessed 2020/04/12. 23. Identifying Practical Significance through Statistical Comparison of MetaHeuristic Stochastic Optimization Algorithms, https://www.sciencedirect.com/science/article/pii/S156849461930643X, last accessed 2020/04/12. 24. Random Gradient Extrapolation for Distributed and Stochastic Optimization, https://www.semanticscholar.org/paper/Random-gradient-extrapolation-fordistributed-and-Lan-Zhou/263edefd27860664c6a596563f5e30dfd01c48e2, last accessed 2020/04/12. 25. Tomashevskyi, V., Yatsyshyn, A., Pasichnyk, V., Kunanets, N., Rzheuskyi A.: Data warhouses of hybrid type: features of construction. Advances in Intelligent Systems and Computing ІІ (AISC) 938, 325–334 (2019). 26. Kazarian, A., Kunanets, N., Pasichnyk, V., Veretennikova, N., Rzheuskyi, A., Leheza, A., Kunanets, O.: Complex information e-science system architecture based on cloud computing model. CEUR Workshop Proceedings 2362, 366–377 (2019). 27. Methods of Statistical Research for Information Managers, https://ieeexplore.ieee.org/document/8526588/authors#authors, last accessed 2020/04/12. 28. Kaminskyi, R., Kunanets, N., Pasichnyk, V., Rzheuskyi, A., Khudyi, A.: Recovery gaps in experimental data. CEUR Workshop Proceedings 2136, 108–118 (2018). 29. Stochastic Approximation and Recursive Algorithms and Applications, https://www.springer.com/gp/book/9780387008943. 30. Adaptive Algorithms and Stochastic Approximations,

https://www.springer.com/gp/book/9783642758966 31. Nazin A., Poznyak A.: Adaptive Choice of Variants: Recurrence Algorithms. Moscow: Science. (1986).

1. Uncertainty Index for Evaluating and Comparing Solutions for Stochastic Multiple Objective Problems , https://www.sciencedirect.com/science/article/abs/pii/S0377221720300047, last accessed 2020 /04/12.

2. A multi-objective model for Pareto optimality in data envelopment analysis crossefficiency evaluation , https://www.sciencedirect.com/science/article/abs/pii/S0377221719310446, last accessed 2020 /04/12.

3. Generalized Pareto copulas: A key to multivariate extremes , https://www.sciencedirect.com/science/article/pii/S0047259X19300296, last accessed 2020 /04/12.

Heuristic Algorithm Combining Pareto Optimization and Niche Technology for Multi-Objective Unequal Area Facility Layout Problem , https://www.sciencedirect.com/science/article/abs/pii/S0952197619303458, last accessed 2020 /04/12.

5. Xiushan , Jiang, Senping, Tian, Tianliang, Zhang, Weihai, Zhang: Pareto optimal strategy for linear stochastic systems with H∞ constraint in finite horizon . Information Sciences 512 , 1103 - 1117 ( 2020 ).

6. An equilibrium in group decision and its association with the Nash equilibrium in game theory , https://www.sciencedirect.com/science/article/pii/S0360835219306072, last accessed 2020 /04/12.

7. An extremum seeking-based approach for Nash equilibrium seeking in N-cluster noncooperative games , https://www.sciencedirect.com/science/article/pii/S0005109820300133, last accessed 2020 /04/12.