=Paper= {{Paper |id=Vol-2917/paper14 |storemode=property |title=Exploring Dynamic Equilibrium Of Alternatives On The Base Of Rectangular Stochastic Matrices |pdfUrl=https://ceur-ws.org/Vol-2917/paper14.pdf |volume=Vol-2917 |authors=Oleksiy Oletsky,Ievgen Fedorchenko,Аndrii Oliinyk,Tetiana Fedoronchak,Tetiana Zaiko,Anastasiia Kharchenko,Oleksiy Kozlov,Olga Skitsan,Ievgen Meniailov,Kseniia Bazilevych,Halyna Padalko,Nazar Koval,Аnatoliy Tryhuba,Igor Kondysiuk,Inna Тryhuba,Oksana Boiarchuk,Mykola Rudynets,Vitalij Grabovets,Vasyl Onyshchuk,Maryna Yevdokymenko,Oleksandra Yeremenko,Anastasiia Shapovalova,Maryna Shapoval,Volodymyr Porokhniak,Natalja Rogovaya,Olha Danyltsiv,Andrii Khomiak,Oleg Nazarevych,Anna Bakurova,Hanna Ropalo,Elina Tereschenko,Yuri Minaev,Oksana Filimonova,Julia Minaeva,Khrystyna Beregovska,Vasyl Teslyuk,Vasyl Beregovskyi,Iryna Kazymyra,Liudvih Fabri,Sergey Orekhov,Henadii Malyhon,Tetiana Goncharenko,Oleh Veres,Pavlo Ilchuk,Olha Kots,Nataliya Boyko,Mariia Yatskiv,Nataliya Boyko,Olena Moroz,Nataliia Hrytsiv,Tetiana Shestakevych,Julia Shyyka,Andrii Vasyliuk,Taras Basyuk,Rostyslav Yurynets,Zoryna Yurynets,Olena Budіakova,Lesia Gnylianska,Marianna Kokhan,Nazar Oleksiv,Oksana Oborska,Khrystyna Mykich,Solomiia Mushasta,Yulia Pukach,Oksana Tereshchuk,Roman Levus,Andrii Berko,Lyubomyr Chyrun,Valentyna Panasyuk,Mykhailo Hrubel,Bohdan Dokhnyak,Victoria Vysotska,Mykola Baran,Oleh Kuzmin,Myroslava Bublyk,Valentyna Panasyuk,Khrystyna Lishchynska,Volodymyr Korolov,Olha Korolova,Ivan-Pavlo Milkovych,Yaroslav Zaiets,Viacheslav Zhyvchuk,Halyna Batyshcheva,Vasyl Lytvyn,Myroslava Bublyk,Yuriy Zdorenko,Oleksandr Lavrut,Tetiana Lavrut,Vasyl Lytvyn,Yevhen Burov,Victoria Vysotska,Vasyl Andrunyk,Dmytro Shostak,Diana Koshtura,Vasyl Andrunyk,Tetiana Shestakevych,Illia Balush,Victoria Vysotska,Solomiia Albota,Oleh Prokipchuk,Lyubomyr Chyrun,Myroslava Bublyk,Valentyna Panasyuk,Viktor Yakimtsov,Roman Kovalchuk,Vitalii Husak,Lyubomyr Chyrun,Yurii Matseliukh,Aleksandr Gozhyj,Roman Nanivskyi,Mykhailo Luchko,Nataliia Kholodna,Victoria Vysotska,Solomiia Albota,Maksym Fedorov,Andrii Berko,Yurii Matseliukh,Vadim Schuchmann,Ihor Budz,Olga Garbich-Moshora,Myroslava Mamchyn,Bohdan Oheruk,Roman Peleshchak,Yaroslav Neznaradko,Lyubomyr Demkiv,Andriy Krysovatyy,Hrystyna Lipyanina-Goncharenko,Svitlana Sachenko,Oksana Desyatnyuk,Sergey Rippa,Anatoliy Sachenko,Maria Rippa,Oksana Chereshnyuk,Oleg Sachenko,Taras Lendyuk,Volodymyr Lytvynenko,Mariia Voronenko,Olena Kovalchuk,Ulzhalgas Zhunissova,Luidmyla Lytvynenko |dblpUrl=https://dblp.org/rec/conf/momlet/Oletsky21 }} ==Exploring Dynamic Equilibrium Of Alternatives On The Base Of Rectangular Stochastic Matrices== https://ceur-ws.org/Vol-2917/paper14.pdf
Exploring Dynamic Equilibrium Of Alternatives On The Base Of
Rectangular Stochastic Matrices
Oleksiy Oletsky
National University of Kyiv-Mohyla Academy, Skovorody St.,2, Kyiv, 04070, Ukraine


                Abstract
                A problem of individual and collective decision making and choosing between some
                alternatives is regarded. Within this context, dynamic equilibrium between alternatives,
                which means that no alternative has advantages over the others, is explored. We are
                developing an approach on the base of matrices, rows of which correspond to various
                distributions of importance among alternatives. Such matrices are called balanced rectangular
                stochastic matrices. We suggest that the Analytic Hierarchy Process (AHP) should be applied
                for getting matrices of importance. From such a matrix we can move on to a matrix which
                represents probabilities of choosing each alternative. The proposed model involves some
                parameters, one of which affects the spread of importance values between the best and the
                worst alternatives, and the other one reflects a degree of an agent’s decisiveness. A Markov
                chain is used for modeling possible changes of agents’ preferences.
                Some experiments illustrating the situation of dynamic equilibrium have been carried out and
                are reported in the paper. Possible ways of breaking such a situation, namely by changing
                transition probabilities and by changing agents’ decisiveness, are described. If a decision is
                made by a group of agents, which is big enough, a slight advantage of one alternative over
                the other may ensure steady wins for this alternative.
                For agents of influence who are trying to shift a situation from the point of a dynamic
                equilibrium by means of affecting what agents know or how they behave, possible strategies
                are discussed. Models describing such influences may involve Markov decision processes
                and a game-based approach.


                Keywords 1
                Decision making, dynamic equilibrium, balanced rectangular stochastic matrices, Analytic
                Hierarchy Process, agents

1. Introduction and methods
    Now there is a significant and steadily growing interest to studies in the field of social modeling on
the base of simulation, different mathematical equations, reinforcement learning etc. An important
subject of this sort of research is a study of an agent’s behavior and that of information influences
aiming to change this behavior. A well-known approach to exploring information influences is based
on models similar to epidemiologist models like SIR and more advanced ones [1]. This means that an
agent’s behavior is considered as being influenced by some ideas known or accepted by ordinary
agents, and agents of influence are aiming to propagate such ideas in order to affect what an object of
influence knows or believes and how an ordinary agent behaves.
    Some approaches presume an awareness of specific models used for decision making. For
example, in [2] a software system based on the AHP (Analytic Hierarchy Process) [3] was presented.


MoMLeT+DS 2021: 3rd International Workshop on Modern Machine Learning Technologies and Data Science, June 5, 2021, Lviv-Shatsk,
Ukraine
EMAIL: oletsky@ukr.net (O.Oletsky)
ORCID: 0000-0002-0553-5915 (O.Oletsky)
           © 2021 Copyright for this paper by its authors.
           Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
           CEUR Workshop Proceedings (CEUR-WS.org)
The integral part of this system is a component aiming to searching for a way of improving a position
of a specific alternative in comparison with the others.
    But there is an urgent need to develop some formalized models describing behavioral aspects in
terms of states and actions so that outer influences could be implicitly and clearly interpreted as
transitions from one state to another, or maybe as a changing of transition probabilities. Such a model
named “state-probability of choice” was suggested in [4]. It regards the states as possible distributions
of probabilities among different actions. This model involves into consideration a Markov chain,
which takes into account transition probabilities across the states. Within this approach, the problem
of dynamic equilibrium, which means the situation when no alternative have advantages over the
others and all of them have equal chances to be chosen, was explored. The main idea of that
formalization relies on a concept of balanced rectangular stochastic matrices [4].
    As regards modeling of information influences, the problem of finding a dynamic equilibrium
makes a significant matter. If it is about promoting specific alternatives, an influencer can at least
make an effort to achieve the nearest point of a dynamic equilibrium, or to achieve the point which
needs the least cost to reach it. On the other hand, a point of a dynamic equilibrium can be regarded as
a starting point of a game where different agents of influence are trying to move the situation away
from this point. Costs of such efforts may be taken into consideration as well [4].
    In this paper a multi-level generalization of the model “state-probability of choice” is suggested. One
level represents distributions of importance values among different alternatives, and the other one deals
with probabilities of choice between these alternatives. We will describe connections between these levels.
    For specifying states of the model, the AHP is used. Some numerical experiments illustrating a
situation of a dynamic equilibrium and showcasing different ways of breaking the equilibrium are
presented in the paper.
    This approach appears to be useful within the problem of looking for possible influencers’
strategies. We will outline the following possibilities:
          to use reinforcement learning models on the base of Markov decision processes;
          to harness a game-based approach.

   2. Dynamic equilibrium and balanced rectangular stochastic matrices

   Let a number of alternatives be n, and m possible distributions of their importance values are
postulated. These distributions can be referred to as the states of the model, and a Markov chain arises
if we regard transition probabilities across these states. Then we can consider a matrix
 Z  ( zij , i  1,.., m; j  1,..., n) where zij is the importance of the j-th alternative for the i-th
distribution. Normally it is postulated that

                          n
                 i  zij  1,
                          j 1                                                               (1)
                 i, j 0  zij  1

   The relation (1) is a definitive feature of square stochastic matrices. So with respect to this analogy
rectangular matrices satisfying (1) were called rectangular stochastic matrices.
   A rectangular stochastic matrix is referred to as a balanced one if it satisfies the relation

          m        m
   j , l  zij   zil                                                                      (2)
         i 1      i 1




   in addition to (1).
    We consider a vector p  ( p1 ,..., pm ) where pi is a probability that an agent is being in the i-th
state at the moment. These probabilities can be either given preliminarily or obtained from the
Markov chain. Then the value of overall importance vi of the i-th alternative can be calculated as a
mathematical expectation of importance by all states:

                m
   v j   pi zij , j  1,.., n                                                                       (3)
                i 1


   or in a matrix notation

   v  pZ

   Then the problem of finding a dynamic equilibrium can be formulated as follows: given Z, find a
vector p (or transition probabilities with p as a stationary distribution) that is a solution of the system
of equations [4]

    m
                       1
    p z  n j
    i 1
            i ij                                                                              (4)


   with the constraints

    m

    p  1, 0  p  1
    i 1
            i              i                                                                  (5)


    Generally speaking, it appears difficult to get an effective formula for characterizing all the
situations of a dynamic equilibrium by direct solving of the task (4)-(5). A concept of balanced
rectangular stochastic matrices plays important role for getting some particular solutions. For the case
of n=2, it was shown in [4] that under some additional technical assumptions a dynamic equilibrium
is achieved if p is a symmetric vector with non-negative components, sum of which equals 1, and Z is
a balanced rectangular stochastic matrix.
    More technically, the next theorem called the theorem on symmetric balancing was proven.
    Let p be a m-vector which satisfies the following relations:

    0  p i  1 , i  1, m ,
    m

    p 1
    i 1
            i




   j  {1.2} i1  1, m i2  m  i1  1 : pi1  pi2

   Let Z be a balanced stochastic ( m  n) -matrix which satisfies the following relations:

   j  {1.2} i1  1, m , i2  m  i1  1 : z i1 j  1  z i2 j .

   Then

    m

    z p  0.5 , j  1, n.
    i 1
           ij     i
   This result only establishes some sufficient conditions but not the necessary ones, and the problem
of finding other situations of dynamic equilibrium is still up-to-date. The question of finding similar
conditions for the case n>2 is of great interest as well.

   3. A multi-level model
   The further development of the formalization described above has different aspects. One of them is
how to obtain a reasonable set of basic states so that these states and transitions across them should
have a clear interpretation. The other aspect is that we can regard not distributions of importance only
but probabilities of actions as well. What was regarded in the paper [4] was just a matrix of states
representing probabilities of actions. These results can be easily transferred to similar situations, e.g.
when the states represent distributions of importance values. Moreover, it is easy to get probabilities
of actions if we know importance of each of them.
   In this paper we regard the following levels of the model:
         the level of basic states B;
         the level of importance distributions characterized by the matrix Z which was introduced
            above;
         the level of probabilities of actions characterized by a matrix Z  .
   Since each of these levels may depend on some parameters, we come to the following
parameterized formula:

   Z   f (a1 ,..., ar , Z ),
                                                                                             (6)
   Z  g (c1 ,..., cq , B )

   where a1, ,..., , ar , b1 ,..., bq are parameters, f and g are given transformation functions. Such a
division into levels allows more distinct distinguishing of different aspects that should be regarded at
different levels.
    Let’s try to specify possible transformation functions and parameters in (6).
    For forming the basic level of states B, we suggest applying an AHP-based approach. Each state
can be specified by a pairwise comparison matrix (i )  ( kl , k , l  1,..., n) , where  kl is an
advantage of the i-th alternative over the j-th one.
    It is possible to apply the standard scale of preference gradations suggested by T.Saaty. It
introduces gradation levels ranging from 1 (equality) to 9 (absolute advantage) and the inversed
values corresponding to them. But this scale may cause problems, and it often may be more
reasonable to use some alternative scales referred to as transitive scales [5]. A transitive scale involves
a parameter   1 which determines how many times a value of the next gradation level is bigger than
that of the previous one. So whereas an equality of alternatives is normally evaluated as 1, the
minimal gradation of advantage should be evaluated as  , the next gradation makes  2 etc. So given
the fixed amount of gradations denoted as H, the whole range of evaluated gradations may be
presented as a following vector:

   G  G ( )  ( [ H / 2 ] , [ H / 2]1,..., 1 ,1, ,..., [ H / 2 ] )


   An important reason for using transitive scales is that a spread between the maximal and the
minimal values of importance can be too big and unnatural, and then the proper choice of  allows to
reduce it. The less is the parameter  , the less is the spread.
   Anyway, if we obtain a pairwise comparison matrix  corresponding to the i-th state, we can get
the i-th distribution of importance corresponding to the i-th row of Z as the Perronian vector (that is
the normalized main eigenvector) of  .
   Moving on from a distribution of importance values to probabilities of choices, if this step is
needed, can be performed merely by the well-known and very common transformation of exponential
normalization


                                          zij
                                   e
                         zij    n
                                                                                    (7)
                                  e
                                             zij

                                  j 1


    With regard to an agent’s behavior, the parameter  in (7) is of a great importance since it reflects
a degree of an agent’s decisiveness. Different agents may have different values of this parameter,
which significantly affects probabilities of choosing alternatives by the agent and thus plays a
smoothing or a sharpening role. If the parameter  is small, an agent hesitates even having an
estimation of one alternative much bigger than that of others, and a smoothing takes place. On the
contrary if  is big, agents choose a better alternative with a big probability even if its advantage
over other alternatives is rather small, and existing difference between alternatives is being
sharpening.

   Finally, the more general formula (6) in the regarded particular case comes down to the following
sequence of steps:
    getting m matrices which represent different pairwise comparisons ( i ) ( ) ; actually it is not
       necessary to take into account all possible matrices of comparisons, but then these should form
       a symmetric structure so that we will get the balanced matrix Z after the next step will have
       been performed; some possible ways to forming such comparison matrices were suggested and
       discussed in [4];
    getting the matrix of importance distributions Z, the i-th row of which is obtained as the
       Perronian vector (that is the normalized main eigenvector) of the pairwise comparison matrix
       ( i ) ( ) ;
    getting the matrix of probabilities Z  by using (7) if needed; while performing this step the
       parameter  should be carefully chosen.

    After performing these steps we will get the set of basic states within the model “state-probability
of choice”. If the last step is omitted, we are going to get another model, which can be called state-
distribution of importance. Transition probabilities across the states within the both models should be
postulated on the base of the specificity of the subject domain. Transitions itself can be clearly
interpreted as changes of pairwise comparison matrices. So an agent who is planning and estimating
possible actions in order to promote a specific alternative and thereby to promote its position can take
into account how much such a change would cost. On this basis an approach outlined in [2] can be
significantly developed.

   4. A numerical example
   Let’s look at the example given below. The main goals of it are as follows:
       illustrating the basic techniques;
       demonstrating the situation of a dynamic equilibrium;
       showcasing how occasional or deliberate changes either of the transition probabilities
           across the states or of the vector p, which represents the stationary probabilities of being in
           a specific state, can break the dynamic equilibrium.

    Let’s put n=2. For this simple case, the basic pairwise comparison matrices can be obtained as
follows:
             1         Gi ( ) 
                               
    ( )   1
     (i )

             G ( )      1 
             i                 

  Having evaluated all pairwise comparisons, we can move on to calculating the matrix of
importance distributions Z.

   For H  11,   1.5 we get the following matrix:

        0.1164      0.8836 
                            
        0.1649      0.8351 
        0.2286      0.7714 
                            
        0.3077      0.6923 
        0.4           0.6 
       
   Z   0.5           0.5 
                            
        0.6           0.4 
        0.6923      0.3077 
                            
        0.7714      0.2286 
        0.8351      0.1649 
                           
         0.8836     0.1164 

   For   5 (the level of decisiveness is quite high):

          0.0211    0.9789 
                           
          0.0339    0.9661
          0.0621    0.9379 
                           
          0.1275    0.8725 
          0.2689    0.7311
         
   Z    0.5         0.5 
                           
          0.7311    0.2689 
          0.8725    0.1275 
                           
          0.9379    0.0629 
          0.9661    0.0339 
                          
           0.9789   0.0211

    Let’s illustrate a situation of a dynamic equilibrium. Since we suggested a set of states with a quite
clear interpretation, we are going to start from transition probabilities across them. The 0-th and the
11-th state correspond to the overwhelming advantage of one alternative over the other (the calculated
coefficients of pairwise comparisons equal 0,1317 and 7,5938 respectively), and the central (the 5-th)
state corresponds to the equality of the alternatives (the coefficient of the comparison equals 1). If an
agent is currently being near the edge, they tend to drift to the edge states, and when being near the
center they tend to drift to the center. So the marginal states and the central one can be regarded as a
sort of attraction points.
  So we may regard the following matrix of transition probabilities which can be considered to be
more or less typical:


      0.9 0.1 0   0  0   0   0  0  0   0   0 
                                              
      0.8 0.1 0.1 0  0   0   0  0  0   0   0 
      0 0.9 0 0.1 0      0   0  0  0   0   0 
                                              
      0    0 0.1 0 0.9 0     0  0  0   0   0 
      0    0   0 0.1 0 0.9 0    0  0   0   0 
     
    0    0   0  0 0.1 0,8 0.1 0  0   0   0 
                                              
      0    0   0  0  0 0.9 0 0.1 0     0   0 
      0    0   0  0  0   0 0.9 0 0.1 0     0 
                                              
      0    0   0  0  0   0   0 0.1 0 0.9 0 
      0    0   0  0  0   0   0  0 0.1 0.1 0.8 
                                             
       0   0   0  0  0   0   0  0  0 0.1 0.9 


   The stationary vector of probabilities p is the solution of the equation

   p  p                                                                                       (8)

   so p is the main eigenvector of  .

   In [4] the following important statement has been proven.
   Let a ( m  m) -matrix A satisfy any of the following relations:


   i, j : aij  ai , m  j 1 ;

   i, j : aij  am  i 1, m  j 1 .


   Then the main eigenvector x of the matrix A is symmetric, i.e.

    j : x j  xm  j 1 .

   Really, for the given matrix  its eigenvector approximately equals

   p = (0.2717, 0.0340, 0.0038, 0.0038, 0.0340, 0.3057,        0.0340,    0.0038,   0.0038,   0.0340,
0.2717)

   A meaningful interpretation of such a distribution may constitute that there are three groups of
agents:
         convinced proponents of the first alternative, the fraction of which numbers about 30%;
         convinced proponents of the second alternative, whose fraction numbers about 30%;
         hesitating agents (nearly 40%).
    Since both Z and Z  are balanced matrices and p is a symmetric vector, the alternatives are to be of
equal values, and are to be chosen with equal probabilities. This can be proven directly by substituting
p into (3).

   5. Variable decisiveness
   An interesting issue arises if we allow the parameter of decisiveness to vary across the states. It
appears to be typical, for instance, in the realm of politics where proponents of one party can be more
decisive than those of another party. Then we can introduce a vector   ( 1 ,..., m ) where  i is a
decisiveness of an agent in the i-th state.
   First of all let’s regard a symmetric situation. Let’s put

    = (6, 5 4, 3, 2, 1, 2, 3, 4, 5, 6)
   We assume here that the more marginal is an agent the more decisive he is.

   Then from the initial matrix Z in our example we move on to the following matrix:


              0.0099    0.9901 
                                
              0.0339    0.9661 
              0.1023    0.8977 
                                
              0.2398    0.7602 
              0.4013    0.5987 
             
   Z (s )   0.5         0.5 
                                
              0.5987    0.4013 
              0.7502    0.2398 
                                
              0.8977    0.1023 
              0.9661    0.0339 
                               
               0.9901   0.0099 

   This matrix is obviously balanced, and the dynamic equilibrium still takes place.

   6. Breaking a dynamic equilibrium
    What can an agent of influence do to move a situation away from the equilibrium and change the
situation in a direction desired for them by doing so? An influencer can try at least two possibilities as
follows:
        changing the transition probabilities;
        changing the parameter of decisiveness  .
    Let’s assume that an influencer is able to change the transition probabilities from the central state
slightly so that the first alternative gets a slight advantage over the other one. It is important to note
that in that case the matrix of transition probabilities loses symmetric features, which according to [4]
is needed for ensuring that the stationary vector of probabilities p would be symmetric. Let’s consider
the following changed matrix of transition probabilities which differs from the previous one only in
the single row:
          0.9 0.1 0   0   0   0   0    0  0   0   0 
                                                     
          0.8 0.1 0.1 0   0   0   0    0  0   0   0 
          0 0.9 0 0.1 0       0   0    0  0   0   0 
                                                     
          0    0 0.1 0   0.9  0   0    0  0   0   0 
          0    0   0 0.1 0   0.9  0    0  0   0   0 
         
      0    0   0  0 0.09 0,8 0.11 0   0   0   0 
                                                     
          0    0   0  0   0  0.9  0   0.1 0   0   0 
          0    0   0  0   0   0  0.9 0 0.1 0      0 
                                                     
          0    0   0  0   0   0   0   0.1 0 0.9 0 
          0    0   0  0   0   0   0    0 0.1 0.1 0.8 
                                                    
           0   0   0  0   0   0   0    0  0 0.1 0.9 


   It produces the following distribution of values:

   0.5314     0.4686

   Now the first alternative gets an advantage over the second one. Even though this advantage itself
seems to be not very crucial, if a number of agents is big enough and if decisions are made by
majority of votes, this advantage ensures steady wins of the first alternative as it was showcased in
[4].
   Let’s look at how an influencer can promote an alternative with the help of making changes to
degrees of decisiveness. Let’s come back to the matrix of transitive probabilities  mentioned above
and put the vector of decisiveness  to

     = (3, 2, 1, 1, 1, 1, 3, 4, 5, 6, 7)

   Then we get the following matrix “state-probability of choice”:

               0.0910     0.9090 
                                  
               0.2075     0.7925 
               0.3675     0.6325 
                                  
               0.4050     0.5950 
               0.4502     0.5498 
              
   Z (s )   0.5          0.5 
                                  
               0.6457     0.3547 
               0.8232     0.1768 
                                  
               0.9379     0.0621 
               0.9824     0.0176 
                                 
                0.9954    0.0046 

   and the distribution of importance as follows:

   0.5352     0.4648
   Again, the first alternative gains an advantage and steadily wins.


   7. Conclusions and discussion
    This paper shows how the model “state-probability of choice”, which was suggested in [4], can be
generalized in order to describe both distributions of importance values among given alternatives and
changes of these distributions. Moreover, we can move on from distributions of importance to
probabilities of choices, therefore we consider some levels of the combined multi-level model.
    The inputs of this model are pairwise comparisons between alternatives and transitional
probabilities across states. It can depend on some parameters, in this paper the following two
parameters were suggested:
        the parameter affecting a spread across values of different alternatives, first of all between
           the best and the worst ones;
        the parameter reflecting decisiveness of agents.
    The staple point regarded in this paper refers to exploring a dynamic equilibrium situation, which
takes place when alternatives are considered as those of equal value and are chosen with equal
probabilities. Some experiments presented above illustrate how such a situation can be reached and
how it can be breached by changing input data and parameters.
    Of course, an agent of influence normally is not able to affect factors like transition probabilities or
agents’ decisiveness directly. They may produce some information influences, and their goals may be
achieved or may not be achieved. It appears reasonable to introduce some probabilities of success,
and thus we cоme to the idea of introducing a Markov decision process, which is considered to be one
of the most known approaches to reinforcement learning [6]. We are able to regard all necessary
components of this process such as a model of transitions or rewards that influencers can get for their
actions.
    A system of states for such a process can be easily devised as well. The simplest approach may
imply considering a matrix, rows of which represent states interpreted as possible stationary
probabilities that an ordinary agent will be in a specific state at the moment. Obviously, this matrix is
a stochastic rectangular matrix, and it can be made balanced.
    For exploring the rewards and the model of transitions, algorithms like Dyna [6] may be applied.
    Another interesting point arises if we involve a game approach by considering a game between
agents of influence. If one influencer is trying to change the situation in a direction beneficial for
them, the other one should counteract these efforts. It is possible to regard both antagonistic and non-
antagonistic formulations of the problem. Such models should take into account both benefits from
ensuring an advantage for a specific alternative and costs needed to gain this advantage. A supervising
body empowered to set rules for such a game may be considered within the model as well.

   8. References
[1] E.V. Ivokhin, Yu.A. Naumenko. On formalization of information dissemination processes based
    on hybrid diffusion models. Journal of Automation and Information Sciences Vol.50-7 (2018)
    79-86. doi: 10.1615/JAutomatInfScien.v50.i7.70.
[2] O.S. Tryhub, R.O. Tryhub, V.V. Gorborukov. Researching semistructured problems of
    multicriteria optimization using the software system. Scientific Notes of NaUKMA Vol.151
    (2013) 79-88.
[3] T.L. Saaty, The Analytic Hierarchy Process. McGraw-Hill, New York, 1980.
[4] O.V. Oletsky, E.V. Ivohin. Formalizing the Procedure for the Formation of a Dynamic
    Equilibrium of Alternatives in a Multi-Agent Environment in Decision-Making by Majority of
    Votes. Cybern Syst Anal Vol.57 (2021) 47-56. doi: https://doi.org/10.1007/s10559-021-00328-y.
[5] I. Chernorutsky. Methods of decision making. St.Peterburg: BHV-Peterburg, 2005.
[6] S. Russel, P. Norvig . Artificial Intelligence: A Modern Approach. Pearson Education, Inc.,
    2003.