Budget Allocation in Binary Opinion Dynamics

       Susana Iglesias Rey                      Patricio Reyes                            Alonso Silva
         Nokia Bell Labs                  Technological Institute for                   Nokia Bell Labs
        Nokia Paris-Saclay                 Industrial Mathematics                      Nokia Paris-Saclay
        Route de Villejust                        (ITMATI)                             Route de Villejust
       91620 Nozay, France                       University of                        91620 Nozay, France
                                           Santiago de Compostela                         alonso.silva
                                        Santiago de Compostela, Spain                 @nokia-bell-labs.com


                                                              few years there has been an increasing literature about
                                                              manipulation of opinion in social networks [6, 7, 11].
                       Abstract                                   In this work, we are interested in finding the most
                                                              efficient use over time of a budget in order to manipu-
    In this article we study the allocation of a bud-         late a social network. The idea is to promote an opin-
    get to promote an opinion in a group of agents.           ion by paying agents to supplant their true opinions.
    We assume that their opinion dynamics are                 We model opinions as two values, 0 or 1, with 1 (0)
    based on the well-known voter model. We are               representing supportive (non-supportive) opinion.
    interested in finding the most efficient use of               We frame the problem of designing sequential pay-
    a budget over time in order to manipulate a               ment strategies as a discounted Markov decision pro-
    social network. We address the problem using              cess (DMDP). DMDPs have been widely used to for-
    the theory of discounted Markov decision pro-             mulate many decision making problems in science and
    cesses. Our contributions can be summarized               engineering (see, e.g., [1, 2, 3]). One of the main appli-
    as follows: (i) we introduce the discounted               cations of DMDP models is the computation of opti-
    Markov decision process in our cases, (ii) we             mal decisions (i.e., actions) over time to maximize the
    present the corresponding Bellman equations,              expected reward (analogously, minimize the expected
    and, (iii) we solve the Bellman equations via             cost).
    backward programming. This work is a step                     First of all, we focus on a fully connected network
    towards providing a solid formulation of the              where agents change their opinion following a voter
    budget allocation in social networks.                     model. We provide the correspondent Bellman equa-
                                                              tions to solve this problem and we show through an
1    Introduction                                             example how to solve the stated problem in practice.
During the last decades a lot of research effort has          We provide a structural characterization of the associ-
been devoted to model the underlying process of opin-         ated value function and the optimal payment strategy.
ion formation of agents that interact through a social        Then, we compute the optimal payment using dynamic
network. In this respect, DeGroot’s [5] or Friedkin-          backward programing.
Johnsen’s [8] models are classic references. Another
classic model is the voter model [4, 9] which considers       2    Model definition
that each agent holds a binary opinion, 0 or 1, and at
                                                              In order to find the optimal budget allocation on bi-
each time step, each agent chooses one of its neighbors
                                                              nary opinion dynamics, we make extensive use of the
at random and adopts that opinion as its own. Other
                                                              theory of DMDPs. First, we adopt the voter model of
works, based on the voter model, incorporate stubborn
                                                              opinion formation over a social network. Then, we de-
agents [12], and biased agents [10]. Moreover, the last
                                                              fine the DMDP and the corresponding Bellman equa-
Copyright © by the paper’s authors. Copying permitted for
                                                              tions to obtain the optimal strategy of budget alloca-
private and academic purposes.                                tion.
Proceedings of the Spring School of Networks, Pucón, Chile,      Consider an undirected social network G = (I, E),
October 2017.                                                 where I stands for the set of agents, indexed from 1
to n, and E ⊆ I × I is the set of edges. Each agent i ∈ I          neighbors is defined as N (j) = {k ∈ I ∣ {j, k} ∈ E}.
has a binary initial opinion. Opinions take values 0               Therefore, we define for one i ∈ I with zero-
or 1. If agent i has opinion 0 (analogously, 1) we label           payment in decision epoch t the labeling func-
it as non-supporter (supporter ). For example, if agents           tions,
are discussing about politics, 1 could be supporting a
                                                                                        ⎧
                                                                                        ⎪1 with prob. ∣{j∈N (i)∣fS (j)=1}∣
                                                                                                                  t

particular party and 0 not supporting it.                                           ⎪                        ∣N (i)∣
                                                                                                                            ,
                                                                        fSt+1 (i) = ⎨
                                                                                        ⎪0 with prob. ∣{j∈N (i)∣f   (j)=0}∣
                                                                                                                  t
    Moreover, we distinguish two cases, depending on                                    ⎪
                                                                                        ⎩
                                                                                                                 S
                                                                                                             ∣N (i)∣
                                                                                                                            ,
whether the network is fully connected or not.
                                                                                              S (i) = 1 − fS (i)
    We start by studying the fully connected case. In                                       t+1            t+1
                                                                                           fN
each decision epoch, agents update their opinions fol-
                                                                   Analogously, for one agent k ∈ I that receives a
lowing a voter model sensitive to external payments.
                                                                   payment in decision epoch t we define:
Let β ∈ [0, 1) be the discount factor, which represents
the loss of reward in the future with respect to the                                             ⎧
                                                                                                 ⎪
                                                                                                 ⎪1 with prob. 1,
current reward. A discounted Markov decision pro-                                    fSt+1 (k) = ⎨
                                                                                                 ⎪
                                                                                                 ⎪0 with prob. 0,
cess (DMDP) is a 5-tuple (S, AS , P, R, β), where S is                                           ⎩
a finite set of states, AS is a finite set of actions, P is
                                                                                             S (k) = 1 − fS (k)
                                                                                           t+1            t+1
                                                                                          fN
the set of transition probabilities and R is the set of
rewards.                                                           As we said, we assume that the graph is fully
    Therefore, the model is defined as follows:                    connected, therefore each agent can communicate
  1. Decision epochs: the set of decision epochs is                with every other agent. We denote the set of non-
      defined as t = 1, 2, . . . , T . We consider a finite        supporter agents that receive a payment as L =
      discrete-time system. At each decision epoch t we            {i ∈ I ∣ non-supporter & receives a payment}
      observe the state s and choose an action a.                  and its cardinality as ` = ∣L∣. Respectively, the
  2. States: the state space S of the DMDP consists                set of supporter agents that receive a payment
      on the possible number of supporters, s ∈ N,                 as K = {i ∈ I ∣ supporter & receives a payment}
      s = ∣{i ∈ I ∣ i is a supporter}∣.                            and its cardinality as k = ∣K∣. Notice that a = `+k.
  3. Actions: the action space As is the set of ac-                Therefore the transition probabilities pt (s1 , s2 , a)
      tions available in state s (without loss of gener-           can be computed as:
                                                                      ● If a > s2 then pt (s1 , s2 , a) = 0.
      ality, actions are state independent). We consider
                                                                      ● If a < s2 then
      that the actions a ∈ N are the possible number
      of payments, a = ∣{i ∈ I ∣ i receives a payment}∣,
      where the non-supporter agents have a cost cN S           pt (s1 , s2 , a) =
      for changing their opinion from non-supporter to               min{s1 −k,s2 −`}
                                                                                                s1 − k s1 i n − s1 s1 −k−i
      supporter, and the supporter agents, a cost cS to                    ∑                (         )( ) (      )
                                                                i=max{0,s2 −(n−s1 )−k}             i    n     n
      hold their supporter opinion. We assume that the
      cost of changing their opinion is higher than the                 n − s1 − `       n − s1 n−s1 −s2 +i+k s1 s2 −i−(`+k)
                                                                ⋅(                   )⋅(       )             ( )
      cost of holding it, i.e., cN S > cS . We consider a            s2 − i − (` + k)      n                  n
      finite budget b. Notice that, because the actions
                                                               5. Reward: the instant reward in time t and state s
      are constrained by the budget, they are station-
                                                                  is defined as rt (s) = ∑i∈I g ⋅ s, where g denotes the
      ary.
                                                                  reward provided by one agent.
  4. Transition probabilities: if the DMDP in deci-
      sion epoch t is at state s1 , the probability that it       Let υβ be the value function of the above DMDP,
      transitions to state s2 taking action a is denoted      i.e., it is the supreme, over all possible budget allo-
      pt (s1 , s2 , a). Due to the natural independence of    cation strategies, of the expectation of the discounted
      agents transitions, we compute those probabili-         reward starting from an initial budget b. Under these
      ties as the product of the transition probabili-        assumptions, the Bellman equations for all s ∈ S and
      ties of the agents. The evolution of one agent          initial budget b are:
      i ∈ I will be described by the voter model. Start-                                                   ∞
      ing from any arbitrary initial labels, supporter             υβ (s, b) =            max            E ∑ β t rt (s)
                                                                                        a=`+k∈AS ,         t=0
      (S) or non-supporter (NS), we consider two label-                              (`⋅cN S +k⋅cS )≤b
      ing functions fS (i) and fN S (i), where fS (i) = 1
      (fN S (i) = 1) means that agent i is a supporter                      =          max          {r0 (s) + ∑ βp1 (s, s′ , a)
                                                                                   a=`+k∈AS ,                    s′ ∈S
      (non-supporter). At each decision epoch t, each                           (`⋅cN S +k⋅cS )≤b
      node selects uniformly at random one of its neigh-
                                                                                ⋅ υβ (s′ , b − ` ⋅ cN S − k ⋅ cS + r0 (s))},
      bors opinion. For each node j ∈ I, the set of its
where the budget evolves as
  b(t + 1) = b(t) + rt (s) − ` ⋅ cN S − k ⋅ cS .
                                                                        initial budget b are:
                                                                                                 ∞
   Next, we present the second case where in each de-                   υβ (s, b) = max E ∑ β t rt (s)
cision epoch agents update their opinions following a                                 a∈AS ,     t=0
                                                                                      cT ⋅a≤b
voter model in a network (not necessarily fully con-
nected) that can be affected by external payments. As                    = max {r0 (s) + ∑ βp1 (s, s′ , a)υβ (s′ , b − cT ⋅ a + r0 (s))},
                                                                            a∈AS ,
                                                                                                s′ ∈S
before, we design this problem as a DMDP where the                          cT ⋅a≤b
set of actions are the possible external payments.
                                                                        where the budget evolves as b(t+1) = b(t)−cT ⋅a+R(st )
   Therefore, let β ∈ [0, 1) be the discount factor, we
                                                                        and cT denotes the transpose of vector c.
consider, by a slight abuse of notation, the 5-tuple
(S, AS , P, R, β) as before. Concretely, the elements
changed from the previous model:                                        3     Simulation results
  1. Decision epochs: the set of decision epochs is de-                 We suppose that after a time T we will not obtain re-
     fined as t = 1, 2, . . . , T .                                     wards for the supporter agents, so we are interested
  2. States: the state space S of the DMDP, consists                    in the distribution of the DMDP in the time interval
     on all possible combinations of agents’ labels, non-               [0, T ]. We consider an undirected, fully connected so-
     supporter (0) or supporter (1), i.e., s ∈ {0, 1}n .                cial network G = (I, E) with n = 7 agents that form
  3. Actions: the action space As is the set of actions                 opinions with a voter model. In time t = 0, we assign
     available in state s (without loss of generality, ac-              at random an initial label to each agent. Solving the
     tions are state independent). An action means                      Bellman (fixed point) equations, given the model and
     whether or not we give a payment to each of the                    the set of states and feasible actions, gives us the best
     agents, a ∈ {0, 1}n , where a 0 (respectively 1) in                strategy for each state to follow in the time interval.
     position i means we give no payment (payment) to                   We take T = 6, β = 0.8, cN S = 10, cS = 5, B = 30 and
     agent i ∈ I. We also define a vector of costs c ∈ Rn+              g = 8.
     whose element ci > 0 is the cost of changing by one                   Some conclusions can be drawn from the simula-
     unit the opinion of agent i, in case agent i is non-               tions. The optimal payment strategy is to invest all
     supporter, or the cost of holding the opinion of                   our budget paying to the higher number of agents
     agent i, in case agent i is a supporter.                           at time t = 0. Obviously, if all the network is non-
  4. Transition probabilities: as before, if the DMDP                   supporter (respectively supporter), the budget will be
     in decision epoch t is at state s1 , the probability               allocated to change opinions (to hold opinions). How-
     that it transitions to state s2 taking action a is                 ever, the distribution of the budget differs for the rest
     expressed as pt (s1 , s2 , a) and can be computed as:              of possible initial states as we show on Table 1.
         ● If ∣a∣ > ∣s2 ∣ then pt (s1 , s2 , a) = 0.

         ● If ∣a∣ < ∣s2 ∣ then pt (s1 , s2 , a) is equal to
                                                                                                        Budget Allocation
                                                                            Initial state
                                                                                                  Payments to NS Payments to S
                                                                                 s=1                    2                1
            n
                                           ∣{j ∈ N (i)/fSt (j) = 1}∣
                       ′
           ∏ I{ si = si }[I{si = supp}                                           s=2                    2                2
                                                                                 s=3
           i=1                                      ∣N (i)∣
                                                                                                        1                3
                                                     t
                  + I{si = non-supp}
                                       ∣{j ∈ N (i)/fN  S (j) = 1}∣
                                                                   ]             s=4                    1                4
                                                 ∣N (i)∣                         s=5                    0                5
                                          ∣{j ∈ N (i)/fSt (j) = 0}∣              s=6                    0                6
             + I{si ≠ s′i }[I{si = supp
                                                   ∣N (i)∣
                                       ∣{j ∈ N (i)/fNt                         Table 1: Budget allocation over the states.
                                                       S (j) = 0}∣
                  + I{si = non-supp}                               ].
                                                 ∣N (i)∣                   Given the budget allocation at time t = 0, we show
                                                                        in Figure 1 the expected reward obtained at time T
  5. Reward: the instant reward in time t and state s                   for each state s ∈ S.
      is defined as rt (s) = g T ⋅ s, where g is the vector of
      rewards whose element gi is the reward that agent                 4     Conclusions
      i ∈ I provides.
    Let υβ be the value function of the above DMDP,                     In this work, we have introduced the problem of
i.e., it is the supreme, over all possible budget alloca-               budget allocation over time to manipulate a social
tions strategies, of the expectation of the discounted                  network. We have developed a formulation of the
reward starting from an initial budget b. Under these                   discounted Markov decision process as well as the
assumptions the Bellman equations for all s ∈ S and                     corresponding Bellman equations for the voter model
                                                           [5] M. H. Degroot. Reaching a consensus. Jour-
                                                               nal of the American Statistical Association,
                                                               69(345):118–121, 1974.
                                                           [6] S. Dhamal, W. Ben-Ameur, T. Chahed, and
                                                               E. Altman. Good versus Evil: A Framework
                                                               for Optimal Investment Strategies for Competing
                                                               Camps in a Social Network. ArXiv e-prints, June
                                                               2017.
                                                           [7] M. Förster, A. Mauleon, and V. Vannetelbosch.
                                                               Trust and manipulation in Social networks, Sept.
                                                               2013. Documents de travail du Centre d’Economie
                                                               de la Sorbonne 2013.65 - ISSN : 1955-611X.

                                                           [8] N. E. Friedkin and E. C. Johnsen. Social influence
                                                               networks and opinion change. Advances in Group
                                                               Processes, 16:1–29, 1999.
       Figure 1: Expected reward at time T .               [9] R. A. Holley and T. M. Liggett. Ergodic theorems
of opinion formation. Using backward programming,              for weakly interacting infinite systems and the
we have obtained the optimal payment strategy for              voter model. The Annals of Probability, 3(4):643–
a small example of agents interacting trough a fully           663, 1975.
connected network. Many questions still remain to be      [10] A. Mukhopadhyay, R. R. Mazumdar, and R. Roy.
answered. Future work would be devoted to improve              Binary opinion dynamics with biased agents and
the performance of our simulations in order to obtain          agents with different degrees of stubbornness. In
the optimal strategy for larger networks and different         2016 28th International Teletraffic Congress (ITC
topologies. Moreover, we intend to construct the               28), volume 1, pages 261–269, Sept. 2016.
DMDP model and the Bellman equations for different
models of opinion dynamics. This will lead to the         [11] A. Silva. Opinion manipulation in social net-
mathematical characterization of the optimal policy            works. In S. Lasaulce, T. Jimenez, and E. Solan,
for different network structures and opinion formation         editors, Network Games, Control, and Optimiza-
models. It will lead also to the characterization of           tion: Proceedings of NETGCOOP 2016, Avi-
the most important agents (the agents with highest             gnon, France, pages 187–198. Springer Interna-
benefit-cost ratio) which should be related with its           tional Publishing, 2017.
centrality as shown in previous results [11, 6].
                                                          [12] E. Yildiz, A. Ozdaglar, D. Acemoglu, A. Saberi,
                                                               and A. Scaglione. Binary opinion dynamics with
References                                                     stubborn agents. ACM Trans. Econ. Comput.,
                                                               1(4):19:1–19:30, Dec. 2013.
 [1] E. Altman. Applications of Markov Decision Pro-
     cesses in Communication Networks, pages 489–
     536. Springer US, Boston, MA, 2002.
 [2] N. Archak, V. Mirrokni, and S. Muthukrishnan.
     Budget Optimization for Online Campaigns with
     Positive Carryover Effects, pages 86–99. Springer
     Berlin Heidelberg, Berlin, Heidelberg, 2012.
 [3] C. Boutilier and T. Lu. Budget allocation using
     weakly coupled, constrained markov decision pro-
     cesses. In Proceedings of the Thirty-Second Con-
     ference on Uncertainty in Artificial Intelligence,
     UAI’16, pages 52–61, Arlington, Virginia, United
     States, 2016. AUAI Press.
 [4] P. Clifford and A. Sudbury. A model for spatial
     conflict. Biometrika, 60(3):581–588, 1973.