Budget Allocation in Binary Opinion Dynamics Susana Iglesias Rey Patricio Reyes Alonso Silva Nokia Bell Labs Technological Institute for Nokia Bell Labs Nokia Paris-Saclay Industrial Mathematics Nokia Paris-Saclay Route de Villejust (ITMATI) Route de Villejust 91620 Nozay, France University of 91620 Nozay, France Santiago de Compostela alonso.silva Santiago de Compostela, Spain @nokia-bell-labs.com few years there has been an increasing literature about manipulation of opinion in social networks [6, 7, 11]. Abstract In this work, we are interested in finding the most efficient use over time of a budget in order to manipu- In this article we study the allocation of a bud- late a social network. The idea is to promote an opin- get to promote an opinion in a group of agents. ion by paying agents to supplant their true opinions. We assume that their opinion dynamics are We model opinions as two values, 0 or 1, with 1 (0) based on the well-known voter model. We are representing supportive (non-supportive) opinion. interested in finding the most efficient use of We frame the problem of designing sequential pay- a budget over time in order to manipulate a ment strategies as a discounted Markov decision pro- social network. We address the problem using cess (DMDP). DMDPs have been widely used to for- the theory of discounted Markov decision pro- mulate many decision making problems in science and cesses. Our contributions can be summarized engineering (see, e.g., [1, 2, 3]). One of the main appli- as follows: (i) we introduce the discounted cations of DMDP models is the computation of opti- Markov decision process in our cases, (ii) we mal decisions (i.e., actions) over time to maximize the present the corresponding Bellman equations, expected reward (analogously, minimize the expected and, (iii) we solve the Bellman equations via cost). backward programming. This work is a step First of all, we focus on a fully connected network towards providing a solid formulation of the where agents change their opinion following a voter budget allocation in social networks. model. We provide the correspondent Bellman equa- tions to solve this problem and we show through an 1 Introduction example how to solve the stated problem in practice. During the last decades a lot of research effort has We provide a structural characterization of the associ- been devoted to model the underlying process of opin- ated value function and the optimal payment strategy. ion formation of agents that interact through a social Then, we compute the optimal payment using dynamic network. In this respect, DeGroot’s [5] or Friedkin- backward programing. Johnsen’s [8] models are classic references. Another classic model is the voter model [4, 9] which considers 2 Model definition that each agent holds a binary opinion, 0 or 1, and at In order to find the optimal budget allocation on bi- each time step, each agent chooses one of its neighbors nary opinion dynamics, we make extensive use of the at random and adopts that opinion as its own. Other theory of DMDPs. First, we adopt the voter model of works, based on the voter model, incorporate stubborn opinion formation over a social network. Then, we de- agents [12], and biased agents [10]. Moreover, the last fine the DMDP and the corresponding Bellman equa- Copyright © by the paper’s authors. Copying permitted for tions to obtain the optimal strategy of budget alloca- private and academic purposes. tion. Proceedings of the Spring School of Networks, Pucón, Chile, Consider an undirected social network G = (I, E), October 2017. where I stands for the set of agents, indexed from 1 to n, and E ⊆ I × I is the set of edges. Each agent i ∈ I neighbors is defined as N (j) = {k ∈ I ∣ {j, k} ∈ E}. has a binary initial opinion. Opinions take values 0 Therefore, we define for one i ∈ I with zero- or 1. If agent i has opinion 0 (analogously, 1) we label payment in decision epoch t the labeling func- it as non-supporter (supporter ). For example, if agents tions, are discussing about politics, 1 could be supporting a ⎧ ⎪1 with prob. ∣{j∈N (i)∣fS (j)=1}∣ t particular party and 0 not supporting it. ⎪ ∣N (i)∣ , fSt+1 (i) = ⎨ ⎪0 with prob. ∣{j∈N (i)∣f (j)=0}∣ t Moreover, we distinguish two cases, depending on ⎪ ⎩ S ∣N (i)∣ , whether the network is fully connected or not. S (i) = 1 − fS (i) We start by studying the fully connected case. In t+1 t+1 fN each decision epoch, agents update their opinions fol- Analogously, for one agent k ∈ I that receives a lowing a voter model sensitive to external payments. payment in decision epoch t we define: Let β ∈ [0, 1) be the discount factor, which represents the loss of reward in the future with respect to the ⎧ ⎪ ⎪1 with prob. 1, current reward. A discounted Markov decision pro- fSt+1 (k) = ⎨ ⎪ ⎪0 with prob. 0, cess (DMDP) is a 5-tuple (S, AS , P, R, β), where S is ⎩ a finite set of states, AS is a finite set of actions, P is S (k) = 1 − fS (k) t+1 t+1 fN the set of transition probabilities and R is the set of rewards. As we said, we assume that the graph is fully Therefore, the model is defined as follows: connected, therefore each agent can communicate 1. Decision epochs: the set of decision epochs is with every other agent. We denote the set of non- defined as t = 1, 2, . . . , T . We consider a finite supporter agents that receive a payment as L = discrete-time system. At each decision epoch t we {i ∈ I ∣ non-supporter & receives a payment} observe the state s and choose an action a. and its cardinality as ` = ∣L∣. Respectively, the 2. States: the state space S of the DMDP consists set of supporter agents that receive a payment on the possible number of supporters, s ∈ N, as K = {i ∈ I ∣ supporter & receives a payment} s = ∣{i ∈ I ∣ i is a supporter}∣. and its cardinality as k = ∣K∣. Notice that a = `+k. 3. Actions: the action space As is the set of ac- Therefore the transition probabilities pt (s1 , s2 , a) tions available in state s (without loss of gener- can be computed as: ● If a > s2 then pt (s1 , s2 , a) = 0. ality, actions are state independent). We consider ● If a < s2 then that the actions a ∈ N are the possible number of payments, a = ∣{i ∈ I ∣ i receives a payment}∣, where the non-supporter agents have a cost cN S pt (s1 , s2 , a) = for changing their opinion from non-supporter to min{s1 −k,s2 −`} s1 − k s1 i n − s1 s1 −k−i supporter, and the supporter agents, a cost cS to ∑ ( )( ) ( ) i=max{0,s2 −(n−s1 )−k} i n n hold their supporter opinion. We assume that the cost of changing their opinion is higher than the n − s1 − ` n − s1 n−s1 −s2 +i+k s1 s2 −i−(`+k) ⋅( )⋅( ) ( ) cost of holding it, i.e., cN S > cS . We consider a s2 − i − (` + k) n n finite budget b. Notice that, because the actions 5. Reward: the instant reward in time t and state s are constrained by the budget, they are station- is defined as rt (s) = ∑i∈I g ⋅ s, where g denotes the ary. reward provided by one agent. 4. Transition probabilities: if the DMDP in deci- sion epoch t is at state s1 , the probability that it Let υβ be the value function of the above DMDP, transitions to state s2 taking action a is denoted i.e., it is the supreme, over all possible budget allo- pt (s1 , s2 , a). Due to the natural independence of cation strategies, of the expectation of the discounted agents transitions, we compute those probabili- reward starting from an initial budget b. Under these ties as the product of the transition probabili- assumptions, the Bellman equations for all s ∈ S and ties of the agents. The evolution of one agent initial budget b are: i ∈ I will be described by the voter model. Start- ∞ ing from any arbitrary initial labels, supporter υβ (s, b) = max E ∑ β t rt (s) a=`+k∈AS , t=0 (S) or non-supporter (NS), we consider two label- (`⋅cN S +k⋅cS )≤b ing functions fS (i) and fN S (i), where fS (i) = 1 (fN S (i) = 1) means that agent i is a supporter = max {r0 (s) + ∑ βp1 (s, s′ , a) a=`+k∈AS , s′ ∈S (non-supporter). At each decision epoch t, each (`⋅cN S +k⋅cS )≤b node selects uniformly at random one of its neigh- ⋅ υβ (s′ , b − ` ⋅ cN S − k ⋅ cS + r0 (s))}, bors opinion. For each node j ∈ I, the set of its where the budget evolves as b(t + 1) = b(t) + rt (s) − ` ⋅ cN S − k ⋅ cS . initial budget b are: ∞ Next, we present the second case where in each de- υβ (s, b) = max E ∑ β t rt (s) cision epoch agents update their opinions following a a∈AS , t=0 cT ⋅a≤b voter model in a network (not necessarily fully con- nected) that can be affected by external payments. As = max {r0 (s) + ∑ βp1 (s, s′ , a)υβ (s′ , b − cT ⋅ a + r0 (s))}, a∈AS , s′ ∈S before, we design this problem as a DMDP where the cT ⋅a≤b set of actions are the possible external payments. where the budget evolves as b(t+1) = b(t)−cT ⋅a+R(st ) Therefore, let β ∈ [0, 1) be the discount factor, we and cT denotes the transpose of vector c. consider, by a slight abuse of notation, the 5-tuple (S, AS , P, R, β) as before. Concretely, the elements changed from the previous model: 3 Simulation results 1. Decision epochs: the set of decision epochs is de- We suppose that after a time T we will not obtain re- fined as t = 1, 2, . . . , T . wards for the supporter agents, so we are interested 2. States: the state space S of the DMDP, consists in the distribution of the DMDP in the time interval on all possible combinations of agents’ labels, non- [0, T ]. We consider an undirected, fully connected so- supporter (0) or supporter (1), i.e., s ∈ {0, 1}n . cial network G = (I, E) with n = 7 agents that form 3. Actions: the action space As is the set of actions opinions with a voter model. In time t = 0, we assign available in state s (without loss of generality, ac- at random an initial label to each agent. Solving the tions are state independent). An action means Bellman (fixed point) equations, given the model and whether or not we give a payment to each of the the set of states and feasible actions, gives us the best agents, a ∈ {0, 1}n , where a 0 (respectively 1) in strategy for each state to follow in the time interval. position i means we give no payment (payment) to We take T = 6, β = 0.8, cN S = 10, cS = 5, B = 30 and agent i ∈ I. We also define a vector of costs c ∈ Rn+ g = 8. whose element ci > 0 is the cost of changing by one Some conclusions can be drawn from the simula- unit the opinion of agent i, in case agent i is non- tions. The optimal payment strategy is to invest all supporter, or the cost of holding the opinion of our budget paying to the higher number of agents agent i, in case agent i is a supporter. at time t = 0. Obviously, if all the network is non- 4. Transition probabilities: as before, if the DMDP supporter (respectively supporter), the budget will be in decision epoch t is at state s1 , the probability allocated to change opinions (to hold opinions). How- that it transitions to state s2 taking action a is ever, the distribution of the budget differs for the rest expressed as pt (s1 , s2 , a) and can be computed as: of possible initial states as we show on Table 1. ● If ∣a∣ > ∣s2 ∣ then pt (s1 , s2 , a) = 0. ● If ∣a∣ < ∣s2 ∣ then pt (s1 , s2 , a) is equal to Budget Allocation Initial state Payments to NS Payments to S s=1 2 1 n ∣{j ∈ N (i)/fSt (j) = 1}∣ ′ ∏ I{ si = si }[I{si = supp} s=2 2 2 s=3 i=1 ∣N (i)∣ 1 3 t + I{si = non-supp} ∣{j ∈ N (i)/fN S (j) = 1}∣ ] s=4 1 4 ∣N (i)∣ s=5 0 5 ∣{j ∈ N (i)/fSt (j) = 0}∣ s=6 0 6 + I{si ≠ s′i }[I{si = supp ∣N (i)∣ ∣{j ∈ N (i)/fNt Table 1: Budget allocation over the states. S (j) = 0}∣ + I{si = non-supp} ]. ∣N (i)∣ Given the budget allocation at time t = 0, we show in Figure 1 the expected reward obtained at time T 5. Reward: the instant reward in time t and state s for each state s ∈ S. is defined as rt (s) = g T ⋅ s, where g is the vector of rewards whose element gi is the reward that agent 4 Conclusions i ∈ I provides. Let υβ be the value function of the above DMDP, In this work, we have introduced the problem of i.e., it is the supreme, over all possible budget alloca- budget allocation over time to manipulate a social tions strategies, of the expectation of the discounted network. We have developed a formulation of the reward starting from an initial budget b. Under these discounted Markov decision process as well as the assumptions the Bellman equations for all s ∈ S and corresponding Bellman equations for the voter model [5] M. H. Degroot. Reaching a consensus. Jour- nal of the American Statistical Association, 69(345):118–121, 1974. [6] S. Dhamal, W. Ben-Ameur, T. Chahed, and E. Altman. Good versus Evil: A Framework for Optimal Investment Strategies for Competing Camps in a Social Network. ArXiv e-prints, June 2017. [7] M. Förster, A. Mauleon, and V. Vannetelbosch. Trust and manipulation in Social networks, Sept. 2013. Documents de travail du Centre d’Economie de la Sorbonne 2013.65 - ISSN : 1955-611X. [8] N. E. Friedkin and E. C. Johnsen. Social influence networks and opinion change. Advances in Group Processes, 16:1–29, 1999. Figure 1: Expected reward at time T . [9] R. A. Holley and T. M. Liggett. Ergodic theorems of opinion formation. Using backward programming, for weakly interacting infinite systems and the we have obtained the optimal payment strategy for voter model. The Annals of Probability, 3(4):643– a small example of agents interacting trough a fully 663, 1975. connected network. Many questions still remain to be [10] A. Mukhopadhyay, R. R. Mazumdar, and R. Roy. answered. Future work would be devoted to improve Binary opinion dynamics with biased agents and the performance of our simulations in order to obtain agents with different degrees of stubbornness. In the optimal strategy for larger networks and different 2016 28th International Teletraffic Congress (ITC topologies. Moreover, we intend to construct the 28), volume 1, pages 261–269, Sept. 2016. DMDP model and the Bellman equations for different models of opinion dynamics. This will lead to the [11] A. Silva. Opinion manipulation in social net- mathematical characterization of the optimal policy works. In S. Lasaulce, T. Jimenez, and E. Solan, for different network structures and opinion formation editors, Network Games, Control, and Optimiza- models. It will lead also to the characterization of tion: Proceedings of NETGCOOP 2016, Avi- the most important agents (the agents with highest gnon, France, pages 187–198. Springer Interna- benefit-cost ratio) which should be related with its tional Publishing, 2017. centrality as shown in previous results [11, 6]. [12] E. Yildiz, A. Ozdaglar, D. Acemoglu, A. Saberi, and A. Scaglione. Binary opinion dynamics with References stubborn agents. ACM Trans. Econ. Comput., 1(4):19:1–19:30, Dec. 2013. [1] E. Altman. Applications of Markov Decision Pro- cesses in Communication Networks, pages 489– 536. Springer US, Boston, MA, 2002. [2] N. Archak, V. Mirrokni, and S. Muthukrishnan. Budget Optimization for Online Campaigns with Positive Carryover Effects, pages 86–99. Springer Berlin Heidelberg, Berlin, Heidelberg, 2012. [3] C. Boutilier and T. Lu. Budget allocation using weakly coupled, constrained markov decision pro- cesses. In Proceedings of the Thirty-Second Con- ference on Uncertainty in Artificial Intelligence, UAI’16, pages 52–61, Arlington, Virginia, United States, 2016. AUAI Press. [4] P. Clifford and A. Sudbury. A model for spatial conflict. Biometrika, 60(3):581–588, 1973.