=Paper=
{{Paper
|id=Vol-1398/SocInf2015_Paper6
|storemode=property
|title=Collective Learning in Games through Social Networks
|pdfUrl=https://ceur-ws.org/Vol-1398/SocInf2015_Paper6.pdf
|volume=Vol-1398
|dblpUrl=https://dblp.org/rec/conf/ijcai/KostermanG15
}}
==Collective Learning in Games through Social Networks==
<pdf width="1500px">https://ceur-ws.org/Vol-1398/SocInf2015_Paper6.pdf</pdf>
<pre>
    Proceedings of the 1st International Workshop on Social Influence Analysis (SocInf 2015)
    July 27th, 2015 - Buenos Aires, Argentina


                      Collective Learning in Games through Social Networks

                                       Sanne Kosterman and Nina Gierasimczuk
                                      Institute for Logic, Language and Computation
                                        University of Amsterdam, The Netherlands
                                               nina.gierasimczuk@gmail.com


                           Abstract                                        Niv, 2009]. Yet the above-mentioned learning paradigms
                                                                           mainly focus on competitive agents, and treat games and net-
     This paper argues that combining social networks
                                                                           works separately.
     communication and games can positively influence
     the learning behavior of players. We propose a                           In this paper, we study the question of how interaction in
     computational model that combines features of so-                     a social network between players of a cooperative game can
     cial network learning (communication) and game-                       possibly influence their learning behavior. We thereby as-
     based learning (strategy reinforcement). The fo-                      sume players to act as one grand coalition, trying to maximize
     cus is on cooperative games, in which a coalition                     the group utility, i.e., we take players to be group-rational.
     of players tries to achieve a common goal. We                         Before performing an action in the game, players can com-
     show that enriching cooperative games with social                     municate with each other in a social network about the es-
     networks communication can influence learning to-                     timated quality of joint strategies. Although individuals can
     wards the social optimum under specific conditions                    only communicate directly with their neighbors, they aim at
     on the network structure and on the existing exper-                   cooperation with the entire social network society.
     tise in the coalition.                                                   We start this paper with a classical graph-theoretical model
                                                                           for learning in social networks in Section 2. Thereafter,
                                                                           in Section 3 we provide a probabilistic aggregation method
1   Introduction                                                           to model the amalgamation of individual opinions after net-
With the rise of the Internet and digital games, communica-                work communication. In Section 4 we describe the process
tion and education have changed rapidly. Two digital tech-                 of learning in stochastic games, relying on a reinforcement
niques introduced towards active and social learning environ-              method. We combine the three formal frameworks in the
ments are serious games and online social networks. Both of                novel learning paradigm proposed in Section 5. In Section
them seem to be auspicious methods that stimulate learning,                6 we discuss the hidden assumptions of this model and we ar-
but are thus far treated as two distinct approaches.                       gue that it contributes to future computer simulations as well
   Several attempts have been made to computationally model                as psychological experiments on human learning.
the learning behavior of artificial agents, both in games and
in social networks. Information transmission and opinion
formation in social networks has already been extensively                  2       Social Network Learning
studied (see, e.g., [Bala and Goyal, 1998; DeGroot, 1974;                  The so-called Lehrer-Wagner model for weighted preference
Easly and Kleinberg, 2010; Jackson, 2008]). In those frame-                aggregation [Lehrer and Wagner, 1981] can be utilized for
works agents can acquire new knowledge and adjust their                    modeling opinion forming in a weighted social network.
opinions by learning from the knowledge and beliefs of                     Here, each agent holds an individual belief about multiple al-
neighbors in a network. The theory of learning in games                    ternatives. He can update this belief through communication,
has been extensively studied by [Camerer and Ho, 1999],                    taking into account the opinion of his neighbors and a degree
[Fudenberg and Levine, 1998], and [Laslier et al., 2001],                  of trust towards his neighbors’ expertise.1
who all discuss normative paradigms for learning towards                      Formally, let G = (N, S, u) be the game with N =
an equilibrium in repeated games. [Bush and Mosteller,                     {1, . . . , n} players, where S = {s(1), . . . , s(k)} is the set
1955] and [Erev and Roth, 1995] provide models for learn-                  of k joint strategies, and u = (u1 , . . . , un ) is a tuple of in-
ing in stochastic games, by making use of reinforcement                    dividual utility functions ui : S → [0, 1] for each i ∈ N .
learning. This line of research has proved useful not only                 Before the game starts, each player i holds a probability dis-
for the study of artificial agents, but also for the under-                tribution over the set of joint strategies, bi : S → [0, 1]. One
standing of human learning behavior [Erev and Roth, 2014;
Copyright © 2015 for the individual papers by the papers’ au-thors.            1
                                                                               In fact, the present model is an extension of a classical social
Copying permitted for private and academic purposes. This volume           network model [DeGroot, 1974] for communication about single
is published and copyrighted by its editors.                               events only.


                                                                      35
     Proceedings of the 1st International Workshop on Social Influence Analysis (SocInf 2015)
     July 27th, 2015 - Buenos Aires, Argentina


could think of these probabilities as subjective degrees of be-                decision schemes (SDSs). We will introduce a variant of this
lief 2 , with respect to the optimality of some joint strategy.                notion, that takes as input the stochastic n × k-matrix B, as
The higher the probability bi (s) for a certain strategy profile               introduced in the previous section. We write B(n, k) for the
s ∈ S, the more player i considers it likely that the joint strat-             set of all such stochastic n×k-matrices. The output of a prob-
egy s is a social optimum, meaning that it entails a maximal                   abilistic social choice function is given by a k-ary row vector
sum of individual payoffs.                                                    ~b that represents a single societal probability distribution over
    All private opinions can be reflected in a stochastic n × k-               the set of k alternatives. We write B(k) for the set of such
matrix B, that holds the subjective probability values bij =                   stochastic k-ary row vectors. Formally, a probabilistic social
bi (s(j)) on its entries. Each i-th row bi q thus represents the               choice function is thus a function F : B(n, k) → B(k).
probability distribution of agent i over the set S. We write B 1                  It is worth noting that the PSCF provides a way of dealing
to denote the initial belief matrix. Additionally, each agent i                with the Condorcet paradox (named after Marie Jean Antoine
assigns a weight wim ∈ [0, 1] to all other group members                       Nicolas de Caritat, le Marquis de Condorcet, 1743–1794),
m ∈ N . The weights can be represented by the stochastic                       since it will always select a winning candidate on the basis
n × n-matrix W . The corresponding social network is thus a                    of probabilities. Even if society likes all alternatives equally
weighted directed graph G = (N, EW ) where N is the set of                     good (represented by equal probabilities for all candidates), a
nodes (agents), EW is the set of weighted directed edges, and                  winning alternative will be chosen at random.
wim is thus the weight on the directed edge from i to m. We                       We will discuss two probabilistic social choice functions
assume the graph to allow for loops, which represents agents’                  that differ with respect to the weights that individuals receive.
self-confidence.                                                               Intuitively, the higher the weight, the more influence an agent
    Each agent can now communicate with neighbors in the                       can exert on the construction of the societal probability distri-
network and update his individual belief by taking a weighted                  bution. In a weighted PSCF, different individuals can receive
arithmetic mean of the beliefs of all agents he trusts. When                   different weights; in an averaged PSCF all individuals receive
iterating this process, the updated belief of agent i about                    equal weights.
strategy s(j)
            P after t rounds    of communication is given by
bt+1
  ij     =                t
               m∈N wim bmj . On a societal level, belief up-                  3.1   Weighted Preference Aggregation
dates after t rounds of network communication are the result
of applying the trust matrix W to the belief matrix B t , i.e.,               In [Lehrer and Wagner, 1981] it has been shown that a special
B t+1 = W · B t (or, equivalently, applying the weight matrix                 kind of a social welfare function, the so-called “Allocation
t times to the initial beliefs B 1 , i.e. B t+1 = W t · B 1 ). Each           Amalgamation Method” (AAM), can be used as a weighted
round of belief updating can thus be described by means of                    preference aggregation method to provide a societal order-
the following algorithm.                                                      ing over a set of k alternatives. The main advantage of such a
                                                                              weighted method for preference aggregation is that it can take
                                                                              into account the expertise of specific group members. We will
Algorithm 1 Network Communication at round t                                  use this method for constructing a weighted probabilistic so-
                                       t
Input: Weight matrix W ; belief matrix
                                    PB                                        cial choice function (wPSCF), that outputs a societal proba-
                               t+1
 1: for all i ∈ N , s(j) ∈ S: bij := m∈N wim btmj                             bility distribution rather than a societal ordering over the set
 2: B t+1 := bt+1                                                             of alternatives.
                     
                 ij    n×k
                           = W · Bt
Output: Belief matrix B t+1                                                      The determination of the weights assigned to individuals,
                                                                              relies on the assumption that all agents agree on how much
                                                                              weight each group member should receive. In terms of the
3    Collective Decision-Making                                               weight matrix W , this agreement on weights corresponds to
                                                                              every row of the matrix being the same. It therefore suf-
After network communication and belief updating, the coali-                   fices to represent the weights in a stochastic row vector w   ~ =
tion needs to decide which joint strategy to adopt in the game.               (w1 , . . . , wn ), in which wi ∈ [0, 1] represents the weight that
We introduce a probabilistic social choice function (PSCF),                   agent i receives from society. A weighted probabilistic social
that maps individual probability distributions over a set of al-              choice function (wPSCF) is then a PSCF F : B(n, k) → B(k)
ternatives to a societal probability distribution. Players in a                                         ~ = ~b = (b1 , . . . , bk ) so that each
cooperative game can make use of such a PSCF to aggregate
                                                                              given by P F (B) = wB
                                                                              bj =              w   b
                                                                                           i∈N i ij In words, a wPSCF is thus a mapping
                                                                                                      .
all individual preferences, in order to determine which joint                 from the individual probability values to a weighted arith-
strategy is most optimal according to society. As an illustra-                metic mean of these values, for each alternative s(j) ∈ S.
tion, consider a football team briefing before the match starts,                 One can easily check that a wPSCF satisfies several ax-
while the players collectively decide on a team strategy.                     iomatic properties from social choice theory, among which
   A first definition for probabilistic social choice functions               independence of alternatives (IA), unanimity (U), neutrality
was introduced by [Gibbard, 1977], in which a preference                      (N), and anonymity (A).
profile, i.e., a tuple of individual preference orders, is mapped
                                                                                 In fact, one can check that wPSCFs even satisfy a stronger
to a lottery, i.e., a single probability distribution over the set
                                                                              notion of unanimity called Pareto optimality (P), which guar-
of alternatives. Gibbard referred to such functions as social
                                                                              antees that strict unanimous agreement among all individuals
   2                                                                          about the order of alternatives is reflected in the societal pref-
     Not to be confused with the quantitative notion of belief used in
quantitative approaches to Belief Revision Theory.                            erence. wPSCFs also satisfy social rationality (SR) which is


                                                                         36
        Proceedings of the 1st International Workshop on Social Influence Analysis (SocInf 2015)
        July 27th, 2015 - Buenos Aires, Argentina


a highly desired property of cooperative games with group-                                  IA    U     N    A     P    SR     ND      C   SP
rational players and a common goal [Arrow, 1951].                                wPSCF      X     X     X    X     X    X       -      -    -
                                                                                 aPSCF      X     X     X    X     X    X      X       X    -
3.2      Averaged Preference Aggregation
                                                                                          Figure 1: Overview of axiomatic properties
 Before [Lehrer and Wagner, 1981] introduced their weighted
 method for allocation amalgamation, [Intriligator, 1973] and
 [Nitzan, 1975] proposed a different probabilistic aggregation
 procedure which they call “the average rule”. The average                     update involves individual learning by way of communica-
 rule can also be seen as a social welfare function that, on                   tion, the second update involves collective learning by way of
 the input of the individual probability distributions, outputs                reinforcement. In a classical method for reinforcement learn-
 a societal ordering over the set of alternatives. We will use                 ing [Bush and Mosteller, 1955] the probability for a certain
 this method for constructing an averaged probabilistic social                 strategy is updated with a weighted average of the previous
 choice function (aPSCF), that outputs a societal probability                  probability and the maximum attainable probability 1. More
 distribution rather than a ordering. Formally, an aPSCF is a                  specifically, suppose player i chooses the individual strategy
 PCSF F : B(n, k) → B(k) given by F (B) = (b1 , . . . , bk ) =                 sti at round t and he receives a payoff of ui (st ), where st de-
~b so that each bj = 1 P
                      n   i∈N bij . The process of belief aggre-               notes the joint strategy played at round t. Then the probability
 gation and collective decision making at some round t, can                    for playing si again, is increased by adding some fraction of
 thus be given by the following algorithm.                                     the distance between the original probability for si and the
                                                                               maximum probability 1. This fraction is given by the product
Algorithm 2 Belief Aggregation at round t                                      of the payoff and some learning parameter λ. The payoffs
                                                                               as well as the constant fraction λ are scaled to lie in the in-
Input: Belief matrix B t                                                       terval from 0 to 1. The size of λ correlates with the speed
 1: for all s(j) ∈ S: btj := n1       t
                                P
                                 i∈N bij                                       of learning [Skyrms, 2010]. The probabilities for all strate-
 2: ~bt := (bt1 , . . . , btk )                                                gies that are not played in the previous rounds, are decreased
Output: Societal belief vector ~bt                                             proportionally.
                                                                                  Formally, let mti : Si → [0, 1] be the mixed strategy for
                                                                               player i at round t. Then, after playing sti at round t, player
Since an aPSCF does not rely on weights, it can be used as
                                                                               i can revise his mixed strategy for the next round t + 1 as
preference aggregation methods as long as an agreement on
                                                                               follows:
weights is not reached yet. Note that an aPSCF can actually
be thought of as a special case of a wPSCF where the weight                                   t
                                                                                              mi (si ) + λ · ui (st )(1 − mti (si )) if si = sti
                                                                                  t+1
vector is given by w ~ = ( n1 , . . . , n1 ). Therefore, an aPSCF sat-         mi (si ) =
                                                                                              mti (si ) − λ · ui (st )(mti (si ))    if si 6= sti
isfies all properties that the more general wPSCFs satisfy. In
fact, the averaged method satisfies non-dictatorship (no sin-
gle individual can always determine the social probabilities)                  Note that this reinforcement method is aimed to model how
and consistency (equal societal preferences of two disjoint                    players can individually learn to improve their private strategy
groups are maintained when the two are merged).                                in repeated games. Let us extend this learning behavior to
   Finally, let us consider strategy-proofness (SP), which indi-               a group level, by allowing the coalition for reinforcing the
cates whether individuals can manipulate the outcome of the                    joint strategies that yield a positive social welfare, i.e., sum
aggregation procedure when submitting an untruthful individ-                   of individual payoffs. In that way, players are collectively
ual preference. It easy to verify that neither an aPSCF, nor a                 learning towards the social optimum.
wPSCF satisfies SP. As we assume all players in the game to                       More specifically, after belief aggregation by the aPSCF,
be group-rational, no player will manipulate the game with                     the coalition holds a societal probability distribution over the
the purpose of increasing his private payoff only, so we do                    set of joint strategies, which (for a round t) is presented by
not worry about dishonest players trying to sabotage the co-
operation. However, since the utility functions of the players                 the stochastic vector ~bt . A joint strategy s(j) ∈ S is now
are assumed to be unknown, they are not certain about the                      chosen to be played with probability btj . Subsequently, play-
social optimum either. Hence, a manipulation by a very ‘un-                    ers get to know the corresponding social P welfare and calcu-
informed’ player can be harmful for the entire coalition.3                     late an average fraction U (st ) := n1 i∈N ui (st ). Instead
                                                                               of the individual payoffs, this fraction is used by each player
                                                                               for reinforcement, towards the joint strategy with a maximal
4       Gameplay and Reinforcement Learning                                    social welfare. Although players perform this second belief
Recall that the first belief update was performed after network                update individually, as they are all reinforcing the same joint
communication, before the game starts. A second belief up-                     strategy with the same reinforcement factor, they are collec-
date is performed after playing the game. Whereas the first                    tively learning towards the social optimum. Note that each
                                                                               player is actually learning which individual role to adopt in
    3
     In Section 5, we will therefore introduce some conditions on the          the team, i.e., which actions of his are consistent with the so-
amount of influence of different individuals in the coalition, ensuring        cial optimum. This process of gameplay and reinforcement at
that such players will not have enough power for manipulation.                 a certain round t can be described by the following algorithm.


                                                                          37
    Proceedings of the 1st International Workshop on Social Influence Analysis (SocInf 2015)
    July 27th, 2015 - Buenos Aires, Argentina


Algorithm 3 Gameplay and Reinforcement at round t                          final updated beliefs at the end of round t, but merely the in-
                                                                           termediate beliefs after communication. During each round
Input: Societal belief vector ~bt ; Belief matrix B t
                                                                           t, the beliefs are thus updated twice, following the scheme:
 1: st := s(j), s.t. s(j) is drawn from S with probability btj
 2: U (st ) := n1
                  P           t                                                                                +
                     i∈N ui (s )                                                           Bt      −→       Bt       −→        B t+1
 3: for all i ∈ N , s(j) ∈ S:                                                                                              +
                                                                           Algorithm 1 thus outputs the matrix B t after network com-
                  t                                                       munication in round t, which will subsequently be used as
                  bij + λ · U (st )(1 − btij ) if s(j) = st                input for Algorithms 2 and 3.
        bt+1
         ij :=
                  btij − λ · U (st )btij       if s(j) 6= st
                                                                           5.1    Network Structure and Learning Outcome
 4: B t+1 := (bt+1
               ij )n×k                                                     Can adding network communication to a cooperative game
Output: Probability matrix B t+1                                           be beneficial for the learning outcome? And if so, under what
                                                                           circumstances? Let us start with showing that network com-
                                                                           munication only influences the learning behavior of players
There are two main reasons for using the Bush-Mosteller re-                as long as an agreement of beliefs is not reached yet. For-
inforcement method rather than, for instance, the one based                mally, we say a group N 0 ⊆ N is in agreement at round t if
on Polyá urns [Erev and Roth, 1995]. Firstly, Bush-Mosteller              for all i1 , i2 ∈ N 0 it holds that bti1 q = bti2 q, i.e., if agents in N 0
reinforcement makes use of utility values that are scaled in the           hold the same probability distributions over S. Also, a group
interval from 0 to 1. This guarantees that the utilities are in the        of nodes in a weighted directed graph is said to be closed if
same scale for all players, thus avoiding unequal influences               there are no outgoing edges from nodes inside the group to
of different players. Moreover, it ensures that the same unit              nodes outside the group. One could imagine that once ev-
is used for payoffs as for individual beliefs about the strategy           eryone in a closed group in the social network has the same
profiles. Thus when reinforcing after gameplay, the utility                beliefs about the social optimum of the game, then agents of
values are appropriately used as some type of weight in order              that group do no longer need to convince each other of their
to update the beliefs.                                                     different opinions.
   Secondly, the Bush-Mosteller model does not take into ac-
count the accumulated rewards of earlier plays, so that the                Proposition 1. Let N 0 ⊆ N be a closed group of agents in
proportion of reinforcement does not get smaller over time.                the network. Once N 0 is in agreement at the beginning of
For modeling processes of collective learning rather than in-              some round t in the Game-Network Learning Model, network
dividual learning, it may make more sense to violate this prin-            communication in that round leaves the beliefs of all agents
                                                                                                            +
ciple (the so-called Law of Practice). Namely, the learning                in N 0 unchanged, i.e., bti q = bti q for all i ∈ N 0 .
process depends on the different communication patterns in                 It follows immediately that for all agents i1 , i2 ∈ N 0 we find
every round, so that it does not slow down.                                   +                          +
                                                                           bti1 q = bti1 q = bti2 q = bti2 q so that they will still be in agreement
                                                                           after network communication. Since the entire social network
5   The Game-Network Learning Model                                        is always closed, once all agents agree, communication is no
In this section we combine the computational methods dis-                  longer needed for an individual to learn towards the social
cussed in the previous three sections. The model, which we                 optimum.
will call Game-Network Learning Model, describes an itera-                     In order to show a positive influence of network communi-
tive process that follows the procedures of network commu-                 cation on the learning process, we measure the quality of our
nication, belief aggregation, and gameplay and reinforcement               Game-Network Learning Model by the probability for play-
in each round (Figure 2).                                                  ing the social optimum at a given round (before agreement).
                                                                           More specifically, if s(j ∗ ) ∈ S is the social optimum, we
                                                                           say learning with network communication in round t is better
                                                                                                                                            +
                                                                           than learning without network communication if btj ∗ > btj ∗ ,
                                                                                       +                      +
                                                                           where btj ∗ = n1 i∈N btij ∗ denotes the probability that the
                                                                                                  P
                                                                           social optimum will be played      P in round t after network com-
                                                                           munication and btj ∗ = n1 i∈N btij ∗ denotes the probability
                                                                           that the social optimum will be played at round t without (or:
                                                                           before) network communication.
                                                                               Intuitively, one could imagine that if there exist experts in
                                                                           the network, who are very close to knowing what the social
                                                                           optimum is, and these experts receive a sufficient amount of
    Figure 2: Main Loop of Game-Network Learning Model
                                                                           trust from all other players in the network, they can convince
                                                                           other players to increase the probability values for the social
Formally, the Game-Network Learning model thus consists                    optimum. Formally, let s(j ∗ ) be the social optimum and let
                                                     +
of a run of Algorithms 1, 2, and 3. We will write B t instead              btj be the probability that society assigns to some s(j) at the
     t+1
of B     to denote the updated beliefs after network commu-                beginning of round t. We say an agent ie ∈ N in the network
nication in round t. We emphasize that these are not yet the               is an expert for round t if btie j ∗ > btj ∗ . We write E t for the


                                                                      38
     Proceedings of the 1st International Workshop on Social Influence Analysis (SocInf 2015)
     July 27th, 2015 - Buenos Aires, Argentina


set of experts for round t. We call the agents i ∈ N \E t non-              cation is higher than before network communication.
experts. Note that it follows directly that for all experts ie ∈                If communication can be beneficial for a single round, we
E t and all non-experts i ∈ N \E t it holds that btie j ∗ > btij ∗ .        should ask if it can also be favorable in every round. We
   Intuitively, the experts for a certain round are the agents              therefore introduce the notion of stable experts, which are
that have in the beginning of that round (and thus at the end of            the agents i ∈ N for which it holds i ∈ E t for every round
the previous round) a higher than average degree of belief for              t ≥ 1. The following theorem states a sufficient condition for
the social optimum. Note that experts can only exist as long                an initial expert (i.e., expert for round 1) to be a stable expert.
as a total agreement is not reached. Namely, if an agreement                In fact, the theorem states an even stronger result that ensures
of beliefs is reached between all agents in the network, every              initial experts always to be in the set of maximal experts.
agent has the same degree of belief that is then trivially equal            Theorem 2. Let s(j ∗ ) ∈ S be the social optimum and let
to the average degree of belief. The notion of an expert is                 E 1 be the set of initial experts. If E 1 is closed and E 1 is in
therefore a relative rather than an objective one: an agent is              agreement at round 1, then E 1 ⊆ Emax    t
                                                                                                                           for all t ≥ 1, as long
only an expert when he has sufficient expertise relative to the             as a total agreement in the network is not reached.
expertise of others in the society.
   Among the set of experts, there always exists a subset of                Hence, if initial experts only assign positive weights to them-
experts who have the highest degrees of belief for the so-                  selves or other initial experts with the same degrees of belief,
cial optimum, compared to all other agents in society. These                then they will always be in the set of maximal experts for
experts can be considered as maximal experts. Formally, if                  each round t ≥ 1. The proof relies on Proposition 1 and the
s(j ∗ ) is the social optimum, then we define the set of max-               intuition that agents with maximal degrees of belief for the
imal experts for round t as Emax  t
                                      = {im ∈ E t | btim j ∗ =              social optimum after network communication, are the agents
                t         t                                                 with maximal degrees of belief for the social optimum after
arg maxi∈N bij ∗ } ⊆ E . Note that it follows directly from                 gameplay and reinforcement learning too.
                                                         t
this definition that for all maximal experts im ∈ Emax       and all            In fact, the group of maximal experts might become bigger
                        t                 t          t
other agents i ∈ N \Emax it holds that bim j ∗ > bij ∗ .                    than the group of initial experts, but ‘new’ maximal experts
   Whether or not experts can exert more influence on the                   can only have a degree of belief for the social optimum that is
outcome of network communication than others, depends on                    at most as high as the respective degree of belief of the initial
their position in the network and the weight that other in-                 experts. Under certain conditions of E 1 , it even holds that
dividuals assign to the opinions of the experts. To analyze                 the group of initial experts is exactly the group of maximal
this issue, we look at one’s centrality [Jackson, 2008]. We                 experts for each round t ≥ 1, i.e., E 1 = Emax   t
                                                                                                                                  .
introduce the notion of weight centrality to express the cen-                                          ∗
trality Pof agents in weighted directed graphs. Formally, let               Proposition 2. Let s(j ) ∈ S be the social optimum and
wi =                                                                        let E 1 be the set of initial experts for round t = 1. If E 1
            m∈N wmi be the total weight that agent i receives               is maximally closed and E 1 is in agreement at round t = 1,
from his neighbors. The weight centrality of some agent
i ∈ N is given by the fraction Ciw = wi /n. In words, weight                then for all t ≥ 1 it holds that E 1 = Emax  t
                                                                                                                              , as long as a total
centrality is a fraction of the highest possible weight that an             agreement in the network is not reached.
agent i can receive, which is n (namely in case all agents                  Here we call a group M ⊆ N maximally closed if all agents
would assign i a weight of 1, including agent i himself). The               outside of M are connected to at least one other agent outside
higher the weight centrality of experts, the more influence                 of M . Intuitively, if the group of initial experts is maximally
experts have on the outcome, and hence the higher the prob-                 closed, then it means that for all agents i ∈ N \E 1 there must
ability will be for playing the social optimum after network                exist another agent j ∈ N \E 1 so that wij > 0. This guar-
communication. The following theorem provides a sufficient                  antees that agents outside of E 1 will always have a smaller
condition for network communication to be better than no                    degree of belief than agents inside E 1 at every round t ≥ 1,
communication.                                                              since the updated beliefs are weighted arithmetic means of
Theorem 1. Let s(j ∗ ) ∈ S be the social optimum. If                        beliefs of neighbors.
Ciwm > n1 ≥ Ciw for all im ∈ Emax   t
                                        and i ∈ N \Emax    t
                                                              , then        Corollary 1. Let s(j ∗ ) ∈ S be the social optimum and let E 1
the probability for playing the social optimum at round t after             be the set of initial experts. If (i) E 1 is maximally closed; (ii)
network communication is higher than before network com-                    E 1 is in agreement at round t = 1; and (iii) Ciwe > n1 ≥ Ciw
                     +                                                                                                          +
munication, i.e., bjt ∗ > btj ∗ .                                           for each ie ∈ E 1 and i ∈ N \E 1 , then btj ∗ > btj ∗ at each
The theorem shows that if maximal experts for round t are                   round t ≥ 1, as long as at total agreement in the network is
trusted more than others, network communication can be ben-                 not reached.
eficial for finding the social optimum. The proof relies on a               In words, under the stated conditions for initial experts, the
decomposition of the individual weights into wim = 1+α for                  probability for playing the social optimum is in each round
all maximal experts and into wi = 1 − β for all other agents                higher after network communication than before (or without)
(with α, β non-negative fractions). One can then show that                  network communication. This corollary thus provides a suf-
the increase of the societal probability for the social optimum,            ficient condition for learning with network communication to
due to communication with trusted experts, is greater than the              be better in the long run than learning without network com-
respective decrease due to communication with less trusted                  munication. The proof follows immediately from Theorem 1
non-experts. This guarantees that the probability for playing               and Proposition 2. Namely, from the assumptions (i) and (ii)
the social optimum at some round t after network communi-                   it follows that the initial group of experts is always the group


                                                                       39
    Proceedings of the 1st International Workshop on Social Influence Analysis (SocInf 2015)
    July 27th, 2015 - Buenos Aires, Argentina


of maximal experts, i.e., E 1 = Emaxt
                                         ⊆ E t for all rounds           help shed some light on the use of the proposed learning
t ≥ 1. Now if this group of maximal experts satisfies the               paradigm in modeling real human learning in serious games.
stated condition for weight centrality (iii), it follows from
Theorem 1 that the probability for playing the social optimum           Acknowledgments
after network communication is higher than without commu-
nication in every round.                                                Nina Gierasimczuk’s work on this article was supported by
                                                                        an Innovational Research Incentives Scheme Veni grant 275-
                                                                        20-043, Netherlands Organisation for Scientific Research
6   Conclusions and Outlook                                             (NWO).
In this paper we have studied the possibility of formal model-
ing the process of learning in game scenarios, among agents             References
arranged in a social network. We proposed an iterative model
                                                                        [Apt and Schäfer, 2014] K.R. Apt and G. Schäfer. Selfish-
of learning that follows the procedures of network communi-
cation, belief aggregation, and gameplay and reinforcement                 ness Level of Strategic Games. Journal of AI Research,
learning. We conclude that interaction in specific social net-             49:207–240, 2014.
works can positively influence the learning behavior of play-           [Arrow, 1951] K. Arrow. Social Choice and Individual Val-
ers in a cooperative game. Learning with network commu-                    ues. John Wiley and Sons, New York, 1951.
nication can be better than learning without communication,             [Bala and Goyal, 1998] V. Bala and S. Goyal. Learning from
when there exist players in the game who know better than                  neighbors. Review of Economic Studies, 5:205–228, 1998.
average which joint strategy corresponds to the social opti-
mum. If these players are sufficiently trusted by society, and          [Barberà et al., 1998] S. Barberà, A. Bogomolnaia, and
players with little knowledge about the social optimum are                 H. van der Stel. Strategy-proof probabilistic rules for ex-
considered less authorial, then the knowledgeable players can              pected utility maximizers. Mathematical Social Sciences,
convince other players towards the social optimum.                         35:89–103, 1998.
   We envision possible extensions of the model in the do-              [Bush and Mosteller, 1955] R. Bush and F. Mosteller.
main of competitive games. For example, one could make                     Stochastic Models of Learning. John Wiley and Sons,
use of cooperative game theory to describe transferable util-              New York, 1955.
ity (TU) games, in which several coalitions can play against
each other. Our model could be extended to a setting in                 [Camerer and Ho, 1999] C. Camerer and T. Ho. Experience-
which different social networks, representing different coali-             weighted attraction learning in normal form games.
tions, play a competitive game. Additionally, in our model                 Econemetrica, 67(4):827–874, 1999.
we assume players not to know their payoff functions, and               [Christoff and Hansen, 2014] Z. Christoff and J.U. Hansen.
not to be aware of the entire network structure. By uti-                   Dynamic Social Networks Logic. Journal of Applied
lizing epistemic models, one could elaborate on these dif-                 Logic (to appear), available as ILLC Prepublication Se-
ferent notions of (restricted) knowledge by means of Dy-                   ries Report PP-2014-09, 2014.
namic Epistemic Logic. A reasonable amount of work on                   [DeGroot, 1974] M.H. DeGroot. Reaching a Consensus.
logics for social networks already has been carried out by
                                                                           American Statistical Association, 69(345):118–121, 1974.
(among others) [Christoff and Hansen, 2014; Liu et al., 2014;
Seligman et al., 2013], and [Zollman, 2012], although not in            [Easly and Kleinberg, 2010] D. Easly and J. Kleinberg. Net-
the context of cooperative games.                                          works, Crowds, and Markets. Cambridge University Press,
   Each component of our Game-Network Learning model                       New York, 2010.
is inspired by existing computational approaches. Varying               [Erev and Roth, 1995] I. Erev and A.E. Roth. Learning in
the choices made for modeling each of them, would result in                Extensive-Form Games: Experimental Data and Simple
new learning paradigms. For example, instead of relying on                 Dynamic Models in the Intermediate Term. Games and
Lehrer’s and Wagner’s model, one could use Bayesian learn-                 Economic Behavior, 212:164–212, 1995.
ing, see [Bala and Goyal, 1998]. Moreover, instead of as-
suming static networks, one could explore a more dynamic                [Erev and Roth, 2014] I. Erev and A.E. Roth. Maximization,
setting in which agents arrive over time and are allowed to                learning, and economic behavior. Proceedings of the Na-
change their weights of trust. As for the preference aggre-                tional Academy of Sciences, 111, 2014.
gation procedure, one could adopt one of the strategy-proof             [Fudenberg and Levine, 1998] D. Fudenberg and D.K.
probabilistic social choice functions proposed by [Barberà et             Levine. The Theory of Learning in Games. The MIT
al., 1998]. For the procedure of gameplay and reinforcement                Press, Cambridge, 1998.
learning, one could possibly allow for more than one social
optimum, and rely on the selfishness level [Apt and Schäfer,           [Gibbard, 1977] A. Gibbard. Manipulation of schemes that
2014] as reinforcement fraction.                                           mix voting with chance. Econometrica, 45(3):665–681,
   Finally, on the empirical side, we envision computer sim-               1977.
ulations in order to check the convergence properties of the            [Intriligator, 1973] M.D. Intriligator. A Probabilistic Model
proposed learning model. Also, testing our model by means                  of Social Choice. The Review of Economic Studies,
of psychological experiments with human participants, would                40(4):553–560, 1973.


                                                                   40
    Proceedings of the 1st International Workshop on Social Influence Analysis (SocInf 2015)
    July 27th, 2015 - Buenos Aires, Argentina


[Jackson, 2008] M.O. Jackson. Social and Economic Net-
   works. Princeton University Press, 2008.
[Laslier et al., 2001] J. Laslier, R. Topol, and B. Walliser. A
   Behavioral Learning Process in Games. Games and Eco-
   nomic Behavior, 37:340–366, 2001.
[Lehrer and Wagner, 1981] K. Lehrer and C. Wagner. Ratio-
   nal Consensus in Science and Society. D. Reidel Publish-
   ing Company, Dordrecht, 1981.
[Liu et al., 2014] F. Liu, J. Seligman, and P. Girard. Logical
   dynamics of belief change in the community. Synthese,
   191:2403–2431, 2014.
[Nitzan, 1975] S. Nitzan. Social Preference Ordering in a
   Probabilistic Voting Model. Public Choice, 24:93–100,
   1975.
[Niv, 2009] Y. Niv. Reinforcement learning in the brain.
   Journal of Mathematical Psychology, 53:139–154, 2009.
[Seligman et al., 2013] J. Seligman, F. Liu, and P. Girard.
   Facebook and the epistemic logic of friendship. In Pro-
   ceedings of Theoretical Aspects of Rationality and Knowl-
   edge, Chennai, India, 2013.
[Skyrms, 2010] B. Skyrms. Signals: Evolution, Learning &
   Information. Oxford University Press, Oxford, 2010.
[Zollman, 2012] K.J.S. Zollman. Social network structure
   and the achievement of consensus. Politics, Philosophy &
   Economics, 11(1):26–44, 2012.


                                                                   41

</pre>