<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Neuroagent Game Model of Collective Decision Making in Conditions of Uncertainty</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Information Systems</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Networks Department</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lviv Polytechnic National University</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lviv Ukraine Petro.O.Kravets@lpnu.ua</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>vpasichnyk@gmail.com</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>antonii.v.rzheuskyi@lpnu.ua</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Computer Systems and Technologies Department, PHEI “Bukovinian University”</institution>
          ,
          <addr-line>Chernivtsi</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The neuroagent game model of collective decision making in conditions of stochastic uncertainty is proposed. Neuroagents are based on artificial neural networks with feedback and learning without a teacher. The current collective decision is obtained after the independent choice of their own decision by all players. Each player generates a current version of decision according to values of the neural network outputs. The choice of decisions is carried out by neuroagents in a random way, regardless of time and other agents. Random choice involves calculation of the probabilities of choosing decisions by optimal design of neural network outputs for a single simplex. After choosing a collective decision, the reaction of decision making environment is determined as a set of values of current neuroagents' gains. The current gain of each neuroagent is transmitted to the inputs of the appropriate two-layer neural network. Then, the training of neuroagents is carried out by changing the weights of neuronal relationships by one of the learning algorithms without a teacher. The learning process is repeated until the weights of the neurons relationships are stabilized with a given accuracy. The course of training is aimed at maximizing the average gains of neuroagents. The decision of the game is achieved in one of the points of the collective optimum or equilibrium, depending on the values of the parameters of the chosen method of neuroagents training.</p>
      </abstract>
      <kwd-group>
        <kwd>Collective Decision Making</kwd>
        <kwd>Uncertainty Conditions</kwd>
        <kwd>Neuroagent Stochastic Game</kwd>
        <kwd>Adaptive Learning Methods</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction to the problem of collective decision making in conditions of uncertainty</title>
      <p>
        To solve the tasks of distributed management and decision making in technical,
economic, informational and social systems, there is a need for a collective choice of
decisions that satisfy one of the conditions of multicriteria optimality, such as Nash,
Pareto, etc. [
        <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4 ref5 ref6 ref7">1-7</xref>
        ]. These conditions in one way or another determine the degree of
benefit and fairness of the collectively achieved decision.
      </p>
      <p>The formal model of the system of collective choice of decisions is determined by
managed environment and decision making managing agents. The environment model
is determined by the structure and functions of the managed system.</p>
      <p>
        The environment perceives agents’ managing decisions and generates output
signals which are interpreted by agents as estimates of efficiency of realized
decisions. Agents get estimations from the environment, choose and implement
decisions to maximize their own win or minimize losses. Such an approach to
building a decision making system is called optimization [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. In contrast to the
situational approach, when the decision is obtained immediately on the basis of input
data, in an optimization approach, the decision is refined through the feedback link.
      </p>
      <p>An agent is an intelligent decision making system, software or technical
implementation of the model of decision maker. Generally, the agent consists of the
following main subsystems: receptor, intellectual and effector. The receptor
subsystem receives signals from the decision making environment, intellectual makes
decisions based on current information, self-study and forecasting, effector provides
the implementation of decisions made with appropriate impact on the environment.</p>
      <p>Collective decision making in a distributed environment is conducted by a set of
interacting intelligent agents, which is called the multiagent system (MAS) [8-12].
Each agent makes a decision based on available local information and interaction
within the MAS. As an active element of MAS, the agent is able to perceive and
analyze data of the information network, negotiate and exchange current information
with other agents, make stand-alone decisions, change the states of the network,
inform the system and the user about the results of their actions.</p>
      <p>The intelligent agents have the following basic properties:
1) autonomy – in pursuit of the goal, the agent can independently make decisions
based on his own knowledge, without requiring external management;
2) reactivity – the agent can perceive external information, produce and implement
adequate actions;</p>
      <p>3) intelligence – the agent's ability to learn, adapt to environment changes, to
process data obtained by artificial intelligence methods, to make optimal decisions;
4) specialization – as a rule, the agents perform highly specialized functions;
5) mobility – to achieve the goal, the software agent can move within the
information network;</p>
      <p>6) coordination – distribution of roles and coordination of actions of agents in
solving common tasks;</p>
      <p>7) interactivity – the agent interacts with other agents, with the information
resources of the network and with the user;</p>
      <p>8) communication – the ability of agents to communicate in technical language and
understand each other;</p>
      <p>9) personality – the unique qualities of the agent, modeling his psychological
features, current emotions, etc.;</p>
      <p>10) decentralization – each of the agents doesn’t have a complete imagination of
the whole system and, therefore, there are no agents who manage the whole system.
When solving a task between agents there are phenomena of cooperation or
competition. As it is known, these types of interaction between agents under
conditions of uncertainty are studied by the theory of stochastic games [13–15].
Therefore, to study the collective behavior of agents in the decision making process, it
is advisable to use the mathematical apparatus of the theory of stochastic games.</p>
      <p>The typical features of the game decision making in the MAS are:
1) distribution or multiparameter of the decision making environment;
2) internal stochasticity of the environment;
3) complete or partial absence of a priori information (uncertainty) about the
decision making environment;</p>
      <p>4) manageability of the environment and the possibility of distributed
implementation of management options;
5) multicriteria of management or decision making;
6) discreteness and finiteness of the set of decision making options;
7) stochastic independence of choice of decisions in space and time;
8) the possibility of multiple repetition of variants of decisions implementations in
time;</p>
      <p>9) distributed locally-dependent character of the formation and collection of
information for statistical identification of the decision making environment;
10) the possibility of using a distributed game algorithm, which ensures the
achievement of compromise decisions area;
11) implementation of the game algorithm in real time;
12) determination of the moments of stopping the game algorithm for the
possibility of its practical application.</p>
      <p>The functioning of the MAS decision making is conducted in conditions of a priori
uncertainty [16]. Uncertainty can be caused by internal or external MAS factors. We
distinguish the following types of uncertainties:</p>
      <p>1) structural – unknown composition of the system and connections between its
elements;
2) algorithmic – unknown algorithm of the system functioning;
3) information-fuzziness, lack of complete information necessary for decision
making;</p>
      <p>4) linguistic – ambiguity of statements in the exchange of messages between
agents;
5) target – an unknown global purpose of the system;
6) social – due to the collective interaction of agents, when the actions of one of
the agents influence the choice of decisions by other agents;</p>
      <p>7) stochastic – the influence on the system of uncontrolled external factors.
In scientific research, the uncertainty of decision making is most often modeled with
the use of mechanism of random variables, which is the basis of stochastic
uncertainty. Partial compensation of uncertainty is provided by the ability of agents to
self-learning and adaptive decision making strategies.</p>
      <p>In the cybernetic literature, game methods of self-learning based on the adaptive
formation of probability distributions of discrete variants of decisions (pure strategies
of players) are well studied [14]. After chosing and implementation of pure strategies
by all players is completed, each of them receives a current win from the
environment, which is used to change the dynamic mixed strategies (conditional
probabilities vectors of the choice of decisions), which are the basis for the
mechanism of generating random pure strategies. This change means that the
probability of choosing a clean strategy increases proportionally to the value of the
current win. The method of reconstructing mixed strategies with the passage of time
built on the basis of stochastic approximation provides maximization of the functions
of average gains on unit simplexes. The intellectual capabilities of agents of such
game are limited, because they model only the reflex behavior of biological systems.</p>
      <p>Artificial neural networks (ANN) can be used to construct intellectual decision
making systems as models of processes of the nervous activity of biological systems
with the ability to memorize, analyze and predict behavior [17–20]. Neural networks
implement "soft" calculations based on the processes occurring in the human brain
and are used as models of objects with unknown characteristics. ANN consist of
many neurons and the connections among them. Training of ANN is to correct the
synaptic connections among neurons based on information which enters the neural
network from the environment. To obtain the necessary structure of connections,
some connections among neurons are amplified while others are weakened.</p>
      <p>The decision making system, based on ANN, is called a neuroagent. Neuroagent
game models of decision making in conditions of uncertainty are insufficiently
studied in modern professional scientific literature.</p>
      <p>The application of neuroagent game models is a promising direction for increasing
the efficiency of collective development and decision making processes in conditions
of uncertainty due to the following features:</p>
      <p>1) nonlinearity – neural networks allow to obtain a nonlinear dependence of the
output signal from the input;</p>
      <p>2) adaptability – neural networks have the ability to adapt their synaptic weights to
environmental changes;</p>
      <p>3) plasticity and resistance to failures – neural networks store information in
distributed form over all connections of the neural network. Failure of one or more
neurons does not lead to failure of the system as a whole;</p>
      <p>4) universality – neural networks do not require special programming, because they
allow to solve various tasks of information processing with the same algorithms of
neurons training.</p>
      <p>The aim of this work is to develop a model of stochastic game of neuroagents for
collective decision making in conditions of uncertainty. To achieve this aim it is
necessary: to set the task of game decision making in conditions of uncertainty; to set
the environment for collective decision making; develop the structure of neuroagents;
to choose the method of teaching neuroagents to solve the formulated task; to develop
algorithm and software tools for simulation of stochastic game of neuroagents; to
analyze the obtained results and make recommendations for their practical use.
2</p>
      <sec id="sec-1-1">
        <title>Setting up of the Game Task of Decision Making In</title>
      </sec>
      <sec id="sec-1-2">
        <title>Conditions Of Uncertainty</title>
        <p>Matrix stochastic game  = ( , {   , [  ]) of decision making is given by:
}
1) multitude of agents I  {i |i 1..L} , where L | I | is a power of multitude or
number of players;</p>
        <p>2) set of pure strategies of agents Ui  {ui (1), ui (2),..., ui ( Ni )} , where Ni is
quantity of pure player’s strategies with number i ;</p>
        <p>3) matrices of the average gains of agents vi (u) i  I , u U , where
u U  iI Ui is collective player’s strategies.</p>
        <p>The stochastic game takes place in discrete moments of time t  1, 2,... . After the
implementation of the collective strategy ut  u each agent gets a current random
gain i,t (u)  R1 with unknown mathematical expectation M{i,t (u)}  vi (u) and
limited dispersion di (u)   . Here and below the index i is the number of the
player, and the index t is the current time.</p>
        <p>The obtained current agent’s gains are averaged over time to evaluate the
effectiveness of the decision making process by each agent:</p>
        <p>t
i,t ({ut })  t 1 i, i  I . (1)
 1
The aim of the agents is to maximize their average payment functions:
lim i,t  max i  I .
t ui
(2)
Thus, based on observations of current gains  i,t each agent i  I must choose
current decisions ui,t  ui Ui so that with the passage of time t  1, 2,... to ensure
maximization of system of target functions (1).</p>
        <p>The decision of the multicriterial problem (2) should be sought in the set of points
of collective equilibrium (for example, Nash) or optimality (for example, Pareto)
depending on the way of agents' choice the sequences of decisions variants.</p>
      </sec>
      <sec id="sec-1-3">
        <title>Game Neuroagent Method of Task Solution</title>
        <p>Known adaptive sequence generation methods {ui,t } i  I , t  1, 2,... based on
dynamic distributions of discrete random variables, which use mixed players’
strategies [14]. In contrast, let’s consider the neuroagent method for solving a matrix
stochastic game, the scheme of which is shown in Fig. 1</p>
        <p>The model of the decision making environment is given by the matrices of the
mathematical expectations of random gains vi i  I . The values of the decision
variants are presented to the input of the environment ui Ui . The output of the
environment is the corresponding values of the current gains  i (u) .</p>
        <p>Each neuroagent is assigned an artificial network with n  2 layers of neurons.
The number of elements of each layer is the same, equal to the number of decisions
Ni | Ui | . The vectors of the parameters of the neuroagents are presented to the
inputs x(n1) , calculated on the basis of outputs of the environment  i (u) i  I .</p>
        <p>i
The outputs of neuroagents are vectors of the parameters yi(n) , on the basis of which
decision variants are determined ui Ui i  I . Weights wi(n1) indicate the value
of synaptic connections among neurons of i -th agent. The positive values of weights
of connections correspond to exciting and negative values correspond to inhibitory
synapses. Zero value of weights means the absence of connection among the neurons.</p>
        <p>The functioning of a neuroagent is carried out by one of the adaptive algorithms of
unsupervised learning, for example, Hebb, Kohonen or others [19]. Unsupervised
learning or self-learning is by nature closest to the brain as its biological prototype.
Self-learning is not oriented to the presence of correct outputs of neural network. The
self-learning algorithm independently detects the internal structure of input data,
rebuilding the weights of synaptic connections so that close (by some metric) sets of
input signals cause sufficiently close output sets of signals. In fact, the process of
neural network self-learning solves the task of data clustering, identifying the
statistical properties of learning multitudes and grouping similar output multitudes
into clusters. By entering a vector from given class to the input of trained neural
network, we obtain the characteristic output vector for this class. The output vector is
not known in advance. Its formation is due to the structure of the training sample,
random distribution of initial values of weights of connections among neurons and a
combination of excited neurons of the output layer of the neural network.</p>
        <p>
          Neuroagents carry out a random choice of decisions ui,t Ui independently of
each other i  I and in time t  1, 2,... . To do this, each neuroagent builds a
vector of conditional probabilities pi,t choice of decisions ui,t by designing the
output vector yi(,nt) on a single Ni -measurable  -simplex:
pi,t (ui,t | ui, ,i, ,  1, 2,...,t 1)   Nt i {yi(,nt)} i  I , (3)
where  Nt i is a projector on a single  -simplex SNt i  S Ni [14]. Parameter  t
adjusts the speed of expansion of  -simplex SNt i to a single simplex S Ni and can
be used as an additional factor in controlling of convergence of game neuroagent
decision making method. The obtained probability vector is used to construct an
empirical distribution of discrete random variables, on the basis of which a choice of
decisions is made:
ui,t  ui [k ] k  arg  min k pi,t (ui,t [ j])    , k  1.. Ni  i  I ,
  k j1  
where  [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ] is a real random number with uniform distribution.
        </p>
        <p>The response of decision making environment to the chosen version is the value of
a random variable with an unknown distribution Z , which is interpreted as the
agent's current gain:</p>
        <p> i (ut ) ~ Z (vi (ut ), di (ut )) ,
where vi (ut ) is mathematical expectation, di (ut ) is dispersion.</p>
        <p>The obtained current gains  i,t (ut ) are submitted to the inputs of neuroagents
x(n1)  e i (ut ) i  I ,
i,t
where e  1| ui Ui  is the vector, all elements of which are equal to one. If
necessary, normalization of vector elements is carried out x(n1) , for example, such
i,t
as:
(4)
(5)
x(n1)  ei (ut ) / | max | i  I ,
i,t
where  max is the maximum value of current gains. Normalization can reduce the
number of steps required to train the neuroagent.</p>
        <p>Total inputs xi(,tn) of neurons of n -th layer are calculated based on outputs yi(,nt1) of
neurons of (n 1) -th layer:</p>
        <p>Ni
xi(,tn)[k ]   wi(,nt1)[ j, k ]yi(,nt1)[ j] i  I , k  1..Ni ,</p>
        <p>j1
where wi(,nt1)[Ni , Ni ] is the matrix of the weights of connections among the nodes of
the neural network, calculated at the moment of time t . Here wi(,nt1)[ j, k] indicates
the weight of connection between j -th node (n 1) -th layer and k -th node of n -th
layer. To calculate outputs yi(,nt) of neuroagent a transfer function  ( ) of neuron is
used:</p>
        <p>yi(,nt)[k]  (xi(,tn)[k]) ,
where k  1..Ni .</p>
        <p>Depending on the task being solved and the type of neural network, the transfer
function  ( ) can be threshold, linear with saturation, sigmoidal, sinusoidal, radially
symmetrical, and so on. Most often for modeling an artificial neural network, linear
(6)
(7)
(8)
(9)
0, if xi(,tn)[k ]  ,
yi(,nt)[k ]  </p>
        <p> (xi(,tn)[k ]  ), if xi(,tn)[k] 
bipolar sigmoidal</p>
        <p>yi(,nt)[k]  0.5  1/(1  exp( (xi(,tn)[k]  )))
transfer functions are used. Parameter   0 defines the tangent of the slope angle for
the linear transfer function and the level of steepness for the sigmoidal transfer
function. Parameter   0 defines the threshold of neuron activation.</p>
        <p>Teaching the neuroagent is carried out by changing the weights wi(,nt1) of synaptic
connections among neurons. Recalculation of the weights of connections is performed
according to Hebb’s signal method, Kohonen’s method or another method of
unsupervised learning.</p>
        <p>Learning by Hebb’s method leads to increase of relationships among excited
neurons:
wi(,nt1)[ j, k]  wi(,nt11)[ j, k]   t ( yi(,nt11)[ j]* yi(,nt11)[k]) , j  1..Ni , k  1..Ni ,
(10)
where  t is the parameter of neuroagent training step.</p>
        <p>Excited are called neurons for which the value of the total input xi(n) exceeds the
activation threshold  .</p>
        <p>Training by Hebb’s differential method leads to increasing of connections among
those neurons, the outputs of which have changed the most:
wi(,nt1)[ j, k]  wi(,nt11)[ j, k]   t ( yi(,nt1)[ j]  yi(,nt11)[ j]) * ( yi(,nt1)[k]  yi(,nt11)[k]) ,
j  1..Ni , k  1..Ni .</p>
        <p>Training by Kohonen’s method is based on the mechanism of competition, the
essence of which is to minimize the difference between the input signals of a
neuronwinner, coming from the outputs of the neurons of the previous layer, and the weight
coefficients of its synapses:
wi(,nt1)[ j, ki* ]  wi(,nt11)[ j, ki* ]   t ( yi(,nt11)[ j]  wi(,nt11)[ j, ki* ]) , j  1..Ni ,
where ki* is index of the neuron-winner of i -th agent.</p>
        <p>In contrast to Hebb’s method, in which simultaneously several neurons of the same
layer can be excited, in Kohonen’s method, neurons of the same layer compete with
each other for the right of activation. This rule is known in literature on machine
learning as "winner takes it all".</p>
        <p>According to Kohonen’s method the restructuring of weights of connections is
carried out only for neuron-winner. The winner is the neuron whose synapse values
are as similar as possible to the input image.</p>
        <p>The neuron-winner is determined by calculating the distance between the vectors
yi(,nt)1 and wi(,nt11) :</p>
        <p>Di,t1[k ] </p>
        <p>Ni 2
 yi(,nt11)[ j]  wi(,nt11)[ j, k] , j  1..Ni .</p>
        <p>j1
The winner is the neuron with the smallest distance:
ki*  index min (Di,t1[k]) .</p>
        <p>k1..Ni
Another way to determine the neuron-winner is to maximize the outputs of yi(,nt)1
neurons of n -th layer:
ui,t1   u[k ] k  arg max yi(,nt)1[ j] .</p>
        <p>
 j1..Ni 
In this case, the index of the neuron-winner is a serial number of the chosen variant of
decision ui,t1 :
ki*  index(ui[k] |  (ui[k]  ui,t1), k  1..Ni ) i  I ,
 ( ) {0,1} is the indicator function of event.
(11)
(12)
(13)</p>
        <p>The training radius can be set around the neuron-winner R in space of the vectors
of weights of neurons:
ri,t [k ]  wi(,nt1)[ki* ]  w(n1)[k ] , k  1..Ni ,
i,t
where wi(,nt1)[ki* ] is the vector of weights of the neuron-winner;  is Euclidean
vector norm.</p>
        <p>Each neuron whose distance from the vector of weights to the vector of weights of the
neuron-winner is less than the radius of training ( ri,t [k ]  R ), takes part in the
calculation of weights of synapses. The weights of neurons that are outside the
training radius do not change. The training radius decreases in time so that at the end
of the training process correction of the weights of connections can be carried out
only by one neuron-winner.</p>
        <p>The parameters  t у (10) – (12) and  t у (3) determine the learning rate of
neuroagents.To ensure the convergence of neuroagents training process, these
parameters are set as positive monotonously decreasing values:
(14)
(16)
 t   0 / t ,  t   0 / t ,
where  0 ,  0 ;  0 ,  0 .</p>
        <p>The choice of decision options continues to a specified number of steps t  tmax , or to
fulfillment the condition of the training accuracy:
 t | I |1  wi(,nt1)  wi(,nt11)   i  I , (15)
iI
where  is the neuroagents training accuracy, which is determined by Euclidean
norm of change the weights of connections among neurons for two consecutive
moments of time.
4</p>
      </sec>
      <sec id="sec-1-4">
        <title>Indicators of Stochastic Game Effectiveness</title>
        <p>
          For practical applications, it is necessary to determine the indicators of the game
effectiveness, which can be used to evaluate the convergence of the game method. In
the absence of direct information exchange between players, such indicators are
formed on the basis of the collective equilibrium condition according to Nash [
          <xref ref-type="bibr" rid="ref1">1, 14,
21</xref>
          ]. To do this, let’s define the functions of the average players’ gains:
Vi ( p)   vi (u) 
uU jI ;uju
        </p>
        <p>p j (u j ) ,
where p  S I , S I   S N j , vi (u)  M{i,t (u)} , and the values p j  S N j are
jI
determined according to (3).</p>
        <p>The Nash equilibrium determines the following strategies for solving the game, for
which the condition is performed:
i  I Vi ( p* )  Vi ( pI \i*, pi )  0 , (17)
where p*  S I is the optimal collective mixed players’ strategy; Vi ( pI \i*, pi ) is the
function of average gains, defined on the simplex S I at random deviation of the
mixed strategy of the i -th player from the equilibrium point by Nash within the unit
simplex. The optimal by Nash leveling mixed strategies can be obtained from the
condition of complementary non-rigidity [21]:
 piVi  VieNi
i  I ,
(18)
(19)
(20)
where pi  S Ni is the mixed strategy of the i -th player;  piVi is the gradient of
polylinear function of average gains (16); eNi  (1j | j  1..Ni ) is a vector, all
elements of which are equal to 1.</p>
        <p>To account for solutions on the boundary of a simple simplex, let’s perform the
weighing of the vector condition of complementary non-rigidity with the elements of
the vectors of mixed strategies:
diag( pi )( piVi ( p)  eNiVi ( p))  0 i  I ,
where diag( pi ) is the square diagonal matrix of N i order, constructed from
elements of the vector pi ; p  S I is combined mixed players’ strategies, set on
convex simplexes of S I .</p>
        <p>Let’s define the Lyapunov’s function as the total current players’ error during the
search for a point of equilibrium according to Nash:
t | I |1  i,t ,</p>
        <p>iI
2
where i,t  pi,t  p%i,t , pi,t , p%i,t  S Ni . Values t  0 t  1, 2,... turns to zero
at the points of equilibrium according to Nash, which can be achieved both inside and
on the vertices of single simplexes. The vectors pi,t are determined according to (3),
and p%i,t is calculated so:</p>
        <p>p%i,t ( j)  Vi,t ( j) Vi,t , j  1..Ni ,
where Vi,t ( j)  pi,t ( j)Vi,t ( j) .</p>
        <p>Direction t to zero at t  1, 2,... will indicate the convergence of the game method
to one of the equilibrium points according to Nash in mixed strategies or taking into
account (19) the achievement of the game solution in pure strategies.</p>
        <p>The order  and value  of the convergence rate of the game method can be
evaluated using the asymptotic method of Jung's moments [22]:
That is the parameter  can be defined as an angle tangent  of slope of the linear
approximation of the function M t  in a logarithmic coordinate system. The
course of a stochastic non-antagonistic game can be traced also by changing the
function of average player’s gains:
t | I |1  i,t .</p>
        <p>iI
(23)
5</p>
      </sec>
      <sec id="sec-1-5">
        <title>Kohonen’s Algorithm of Neuroagent Functioning</title>
        <p>
          1. Set the initial parameters values:
t  0 is the initial time point;
I is a multitude of players;
Ni i  I is a number of decisions options;
Ui  {ui [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ],ui [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ],...,ui [N ]} i  I is a multitude of decisions options;
[vi ] i  I is the matriсes of mathematical expectations of gains;
[di ] i  I is the matrices of dispersions of gains;
wi(,n01)[Ni , Ni ] i  I is the matrix of initial weights of connections among nodes of
the neural network;
 ,  is parameters of the transfer function of the neurons;
 0 is parameter of training step;
 (0,1] is the order of training step;
 0 is parameter of  -simplex;
 is the order of the speed of expansion of  -simplex;
tmax is the maximum number of steps of the method;
 is accuracy of training.
2. Perform choices of decision variants ui,t i  I according to (3) – (4).
3. Obtain the value of current gains of neuroagents as random variables with normal
distribution  i (ut ) ~ Normal(vi (ut ), di (ut )) i  I .
corresponding outputs of yi(,nt1) i  I neurons (n 1) -th layer according to (7).
i  I (6) and the corresponding
5. Calculate the total inputs of neurons x (n)
i,t
outputs yi(,nt) i  I (7) for neurons of n -th layer.
6. Calculate parameter value of  t according to (14).
7. Define ki* indexes of the winning neurons i  I according to (13).
8. Calculate the weight of connections to neurons winners wi(,nt1)[ j, ki* ] i  I ,
j  1..Ni according to (12).
t (20),  (22).
9. Calculate the characteristics of the quality of collective decision making t (23),
10. Set the next point in time t : t  1 .
11. If the condition (15) of the end of the game is not met, then go to step 2,
otherwise – the end.
6
        </p>
      </sec>
      <sec id="sec-1-6">
        <title>Results of Computer Modeling</title>
        <p>Let's solve the stochastic game of two neuroagents | I | 2 with two clean strategies
Ni  2 , i  1..2 . Average gains matrices [vi ]22 i  1..2 such a game is given in
Table. 1.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Strategies</title>
      <p>
        
p1 (u1[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ])
p1(u1[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ])
The view of slices of functions of average gains of neuroagents (17) with values on
the unit simplex, corresponding to Table 1, is shown in Fig. 2.
      </p>
      <p>
        Fig. 2. Functions of average gains of neuroagents
Analysis of average win functions shows that the game has one Nash solution in
mixed ( p1[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], p2[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ])  (0.5, 0.2) and two solutions in pure strategies
( p1[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], p2[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ])  (0,1) , ( p1[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], p2[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ])  (1, 0) .
      </p>
      <p>
        Under conditions of uncertainty, the elements of matrices of average winnings
vi (u)uU are unknown and available for observation in the form of random
current values  i (u) . To simulate random gains we choose the normal distribution law:
 i (u)  Normal(vi (u), di (u)) ,
where vi (u) is a mathematical expectation; di (u) is a dispersion.
Normallydistributed random variables are calculated using the sum of uniformly distributed
random real numbers  [
        <xref ref-type="bibr" rid="ref1">0,1</xref>
        ] :
i (ut )  vi (ut ) 
      </p>
      <p> 12 
di (ut )  i,t [ j]  6  .</p>
      <p>
         j1 
Initial values of link weights w(n1) between neurons are random variables uniformly
0
distributed in the interval [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ]. The linear dependence of the output on the total
inputs is chosen as the transfer function of neurons (8).
      </p>
      <p>The convergence of the gaming neuroagent method is determined by the exact ratio
of parameters, which in the general case must satisfy the basic conditions of
stochastic optimization [22–28]. The parameters of the neuroagent training method
take the following values:   1;   0,1; Ni  N  2 ;   0.999 / N ;   1 .</p>
      <p>The graphs of functions of average gains are shown in logarithmic scale in Fig. 3
i,t and the error of achieving an optimal collective solution at one of the
equilibrium points i,t according to Nash.
Increase i,t and reduction i,t in time indicate the convergence of the neuroagent
method of decision-making in the sense of fulfilling the condition of complementary
rigidity (19).</p>
      <p>
        The trajectories of change of conditional probabilities of collective choice of
variants of decisions within a single simplex are shown in Fig. 2. From the obtained data
it can be seen that for the given parameter values, the game neuroagent method (12)
provides the solution of stochastic game on top of a single simplex
( p1[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], p2[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ])  (0,1) and has close to1 order   tg( ) of convergence speed.
      </p>
      <p>For the convergence of the stochastic game of neuroagents to Nash equilibrium
points with probability 1 or in medium-quadratic, it is necessary that the ratio of
parameters of the recurrent game method primarily satisfies the fundamental conditions
of stochastic approximation [29, 30]. In conditions of uncertainty the theoretical study
of convergence is based on the stochastic Robins-Monroe approximation, the results
of the Robin-Sigmund’s lemma, and depends on both the parameters of the
environment and the values of the parameters of the game method [31].</p>
      <p>Recurrent game algorithms provide the order degree of convergence rate
convergence rate [31] and are easy to program. The low rate of convergence is due to the
lack of a priori information about players' payment matrices and the lack of direct
exchange of information between players. In addition, each player does not have
information about the structure of the game (the number of players, the number of
strategies, dependence of own payments on the strategies of other players). During the
multi-step game, each player learns to choose pure strategies so as to optimize own
function of average payments.</p>
      <p>The learning process is a reorganization of one's own mixed strategies over time –
by increasing the probability of choosing pure strategies that, on average, produce the
best results.</p>
      <p>Theoretical evaluations of the conditions of convergence of game algorithms in
conditions of uncertainty are complex and difficult to analyze in a rigorous analytical
way. In addition, such convergence conditions are evaluations from above and
therefore, there is a question about their accuracy. In this regard, the theoretical results will
not always be satisfactory and require experimental clarification. It is experimentally
established that scaling the system in the direction of increasing the number of players
and the number of pure strategies leads to an increase in the entropy of a stochastic
game, and as a result is a slowing down of its rate of convergence.</p>
      <p>Let's study the dependence of the learning time of stochastic game of neuroagents
on the basic parameters of the algorithm. We define training time as the minimum
number of steps required to train neuroagents with a given accuracy   0 :
tout  (t  tmin | t   ) ,
where the current training accuracy  t is calculated according to (15).</p>
      <p>Due to the random choice of solutions, we need to average the training time of
neuroagents for different sequences of random variables:
where kexp is a number of experiments.</p>
      <p>The average number of training steps t depends on the parameters of the training
algorithm of neuroagents and the parameters of the decision-making environment.</p>
      <p>Graph of average time dependence t of neuroagents training from the parameter
 is shown in Fig. 4. The results obtained for dispersion value of current gains
d  0.01 . The parameter  (0,1] determines the order of monotonic decrease of
the value  t  0 (14), which regulates the speed of training of neuroagents. As the
value increases, the t value decreases.
The accuracy of neuroagents training of game is   103 . The data are averaged by
kexp  100 experiments. In all experiments, the choice of pure decision strategies for
which condition (19) is met is provided with high probability.</p>
      <p>As can be seen from the simulation results, the increase of  parameter, leads to a
decrease in the average number of training steps t of neuroagent training with
accuracy  .</p>
      <p>The dependence of the average number of steps t of training neuroagents on the
dispersion d of the stochastic environment (dispersion of evaluations of variants of
solutions) is presented in Fig. 5. The results are obtained for the value of order of step
of the neuroagent training   0.5 .</p>
      <p>As dispersion increases, the average number of steps required for training neuroagents
increases. A significant increase in dispersion may lead to an incorrect determination
of the optimal decision. Modeling of stochastic game with other matrices of average
gains of neuroagents can show a different result, preserving the obtained
dependencies among the parameters, required to optimize the payment functions of the players.</p>
      <sec id="sec-2-1">
        <title>Conclusions</title>
        <p>The neuroagent game model of collective decision making in conditions of stochastic
uncertainty is proposed in this article. Neuroagents are based on artificial neural
networks with feedback and learning without a teacher. The current collective decision is
obtained after the independent choice of their own decision by all players. Each
player generates a current version of decision according to values of the neural network
outputs. The choice of decisions is carried out by neuroagents in a random way,
regardless of time and other agents. Random choice involves calculation of the
probabilities of choosing decisions by optimal design of neural network outputs for a single
simplex. After choosing a collective decision, the reaction of decision making
environment is determined as a set of values of current neuroagents’ gains. The current
gain of each neuroagent is transmitted to the inputs of the appropriate two-layer
neural network. Then, the training of neuroagents is carried out by changing the weights
of neuronal relationships by one of the learning algorithms without a teacher. The
learning process is repeated until the weights of the neurons relationships are
stabilized with a given accuracy. The course of training is aimed at maximizing the
average gains of neuroagents. The decision of the game is achieved in one of the points of
the collective optimum or equilibrium, depending on the values of the parameters of
the chosen method of neuroagents training.</p>
        <p>The developed software model confirms the convergence of the game neuroagent
method (12) of decision making. The efficiency of the method is estimated by means
of characteristic functions of average gains and errors of collective choice of the
optimal decision making variant. The convergence of the neuroagent game method
depends on the number of players, decisions, and relationships among the method
parameters and decision making environment parameters.</p>
        <p>The reliability of the obtained results has been confirmed by the repetition of the
values of the calculated characteristics of the game neuroagent decision making
method for different sequences of random variables.</p>
        <p>The results of this work can be used to construct distributed control and decision
making systems in conditions of uncertainty.</p>
        <p>The conducted researches can be continued in the direction of applying of another
configuration of neuroagents and other methods of their training, information
exchange between agents of the stochastic game, growth of the number of players and
the number of their pure strategies, definition of theoretical conditions of convergence
of the game neuroagent method.
8. Consensus of Fractional-Order Multiagent System via Sampled-Data
EventTriggered Control,
https://www.sciencedirect.com/science/article/abs/pii/S0016003218301595, last
accessed 2020/04/12.
9. Tauberian Theorems For General Iterations of Operators: Applications to
ZeroSum Stochastic Games,
https://www.sciencedirect.com/science/article/abs/pii/S0899825618300204?via%
3Dihub, last accessed 2020/04/12.
10. Evolution of cooperation in stochastic games,
https://www.semanticscholar.org/paper/Evolution-of-cooperation-in-stochasticgames-Hilbe-%C5%A0imsa/ff49474432459bf295a7d2a775b1719c4fa09755, last
accessed 2020/04/12.
11. Zhan, Bu, Guangliang, Gao, Hui-Jia, Li, Jie Cao: CAMAS: A cluster-aware
multiagent system for attributed graph clustering. Information Fusion 37, 10–21
(2017).
12. Chaoxu, Mu, Qian, Zhao, Zhongke, Gao, Changyin, Sun: Q-learning solution for
optimal consensus control of discrete-time multiagent systems using
reinforcement learning. Journal of the Franklin Institute 356 (13), 6946–6967 (2019).
13. Commutative Stochastic Games
https://pubsonline.informs.org/doi/10.1287/moor.2014.0676, last accessed
2020/04/12.
14. Semi-algebraic Tools for Stochastic Games 2015,
https://www.semanticscholar.org/paper/Semi-algebraic-Tools-for-StochasticGames-Frederiksen/97d4bc8b3800fd14215f03aaddb6f4e3570255cd, last
accessed 2020/04/12.
15. A Stochastic Game Framework for Analyzing Computational Investment
Strategies in Distributed Computing with Application to Blockchain Mining
Mathematics,
https://www.semanticscholar.org/paper/A-Stochastic-Game-Framework-forAnalyzing-in-with-DhamalChahed/0702f46e00dc3171ed0b38fdc827b21b120423ee, last accessed
2020/04/12.
16. CSI Neural Network: Using Side-channels to Recover Your Artificial Neural
Network Information
https://www.semanticscholar.org/paper/CSI-NeuralNetwork%3A-Using-Side-channels-to-Recover-BatinaBhasin/905ad646e5745afe6a3b02617cd8452655232c0d, last accessed
2020/04/12.
17. Many Body Physics: Solving the Quantum Many Body Problem with Artificial
Neural Networks,
https://www.semanticscholar.org/paper/MANY%E2%80%90BODYPHYSICS%3A-Solving-the-quantum-many%E2%80%90body-CarleoTroyer/e4a85af3f5dc41e13dc2cae9ee851953709b764e, last accessed 2020/04/12.
18. Integration of New Evolutionary Approach with Artificial Neural Network for
Solving Short Term Load Forecast Problem,
https://www.semanticscholar.org/paper/Integration-of-new-evolutionaryapproach-with-for-SinghDwivedi/0a5e2c346f61d9f68f323bc9946dee6988f72687, last accessed
2020/04/12.
19. State-of-the-art in Artificial Neural Network Applications: A survey 2018,
https://www.semanticscholar.org/paper/State-of-the-art-in-artificial-neuralnetwork-A-Abiodun-Jantan/efdb2aa8d8dadc182b139623911157fa158648ad, last
accessed 2020/04/12.
20. An Artificial Neural Network as a Troubled-Cell Indicator,
https://www.semanticscholar.org/paper/An-artificial-neural-network-as-atroubled-cell-Ray-Hesthaven/2f8af28a213d614ec3b7f7d3f407095d838c944c, last
accessed 2020/04/12.
21. Particle Filtering Methods for Stochastic Optimization with Application to
LargeScale Empirical Risk Minimization,
https://www.sciencedirect.com/science/article/abs/pii/S0950705120300083, last
accessed 2020/04/12.
22. DSCTool: A Web-Service-Based Framework for Statistical Comparison of
Stochastic Optimization Algorithms,,
https://www.sciencedirect.com/science/article/pii/S1568494619307586, last
accessed 2020/04/12.
23. Identifying Practical Significance through Statistical Comparison of
MetaHeuristic Stochastic Optimization Algorithms,
https://www.sciencedirect.com/science/article/pii/S156849461930643X, last
accessed 2020/04/12.
24. Random Gradient Extrapolation for Distributed and Stochastic Optimization,
https://www.semanticscholar.org/paper/Random-gradient-extrapolation-fordistributed-and-Lan-Zhou/263edefd27860664c6a596563f5e30dfd01c48e2, last
accessed 2020/04/12.
25. Tomashevskyi, V., Yatsyshyn, A., Pasichnyk, V., Kunanets, N., Rzheuskyi A.: Data
warhouses of hybrid type: features of construction. Advances in Intelligent Systems and
Computing ІІ (AISC) 938, 325–334 (2019).
26. Kazarian, A., Kunanets, N., Pasichnyk, V., Veretennikova, N., Rzheuskyi, A., Leheza, A.,
Kunanets, O.: Complex information e-science system architecture based on cloud
computing model. CEUR Workshop Proceedings 2362, 366–377 (2019).
27. Methods of Statistical Research for Information Managers,
https://ieeexplore.ieee.org/document/8526588/authors#authors, last accessed 2020/04/12.
28. Kaminskyi, R., Kunanets, N., Pasichnyk, V., Rzheuskyi, A., Khudyi, A.: Recovery gaps in
experimental data. CEUR Workshop Proceedings 2136, 108–118 (2018).
29. Stochastic Approximation and Recursive Algorithms and Applications,
https://www.springer.com/gp/book/9780387008943.
30. Adaptive Algorithms and Stochastic Approximations,</p>
        <p>https://www.springer.com/gp/book/9783642758966
31. Nazin A., Poznyak A.: Adaptive Choice of Variants: Recurrence Algorithms. Moscow:
Science. (1986).</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <article-title>1. Uncertainty Index for Evaluating and Comparing Solutions for Stochastic Multiple Objective Problems</article-title>
          , https://www.sciencedirect.com/science/article/abs/pii/S0377221720300047, last accessed
          <year>2020</year>
          /04/12.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <article-title>2. A multi-objective model for Pareto optimality in data envelopment analysis crossefficiency evaluation</article-title>
          , https://www.sciencedirect.com/science/article/abs/pii/S0377221719310446, last accessed
          <year>2020</year>
          /04/12.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Generalized</surname>
          </string-name>
          <article-title>Pareto copulas: A key to multivariate extremes</article-title>
          , https://www.sciencedirect.com/science/article/pii/S0047259X19300296, last accessed
          <year>2020</year>
          /04/12.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>A</given-names>
            <surname>Heuristic Algorithm</surname>
          </string-name>
          <article-title>Combining Pareto Optimization and Niche Technology for Multi-Objective Unequal Area Facility Layout Problem</article-title>
          , https://www.sciencedirect.com/science/article/abs/pii/S0952197619303458, last accessed
          <year>2020</year>
          /04/12.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Xiushan</surname>
          </string-name>
          , Jiang, Senping, Tian, Tianliang, Zhang, Weihai, Zhang:
          <article-title>Pareto optimal strategy for linear stochastic systems with H∞ constraint in finite horizon</article-title>
          .
          <source>Information Sciences</source>
          <volume>512</volume>
          ,
          <fpage>1103</fpage>
          -
          <lpage>1117</lpage>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <article-title>6. An equilibrium in group decision and its association with the Nash equilibrium in game theory</article-title>
          , https://www.sciencedirect.com/science/article/pii/S0360835219306072, last accessed
          <year>2020</year>
          /04/12.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <article-title>7. An extremum seeking-based approach for Nash equilibrium seeking in N-cluster noncooperative games</article-title>
          , https://www.sciencedirect.com/science/article/pii/S0005109820300133, last accessed
          <year>2020</year>
          /04/12.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>