<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Collective Learning in Games through Social Networks</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sanne Kosterman</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nina Gierasimczuk</string-name>
          <email>nina.gierasimczuk@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute for Logic, Language and Computation University of Amsterdam</institution>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2015</year>
      </pub-date>
      <fpage>35</fpage>
      <lpage>41</lpage>
      <abstract>
        <p>This paper argues that combining social networks communication and games can positively influence the learning behavior of players. We propose a computational model that combines features of social network learning (communication) and gamebased learning (strategy reinforcement). The focus is on cooperative games, in which a coalition of players tries to achieve a common goal. We show that enriching cooperative games with social networks communication can influence learning towards the social optimum under specific conditions on the network structure and on the existing expertise in the coalition.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>With the rise of the Internet and digital games,
communication and education have changed rapidly. Two digital
techniques introduced towards active and social learning
environments are serious games and online social networks. Both of
them seem to be auspicious methods that stimulate learning,
but are thus far treated as two distinct approaches.</p>
      <p>
        Several attempts have been made to computationally model
the learning behavior of artificial agents, both in games and
in social networks. Information transmission and opinion
formation in social networks has already been extensively
studied
        <xref ref-type="bibr" rid="ref12 ref15 ref3 ref8 ref9">(see, e.g., [Bala and Goyal, 1998; DeGroot, 1974;
Easly and Kleinberg, 2010; Jackson, 2008])</xref>
        . In those
frameworks agents can acquire new knowledge and adjust their
opinions by learning from the knowledge and beliefs of
neighbors in a network. The theory of learning in games
has been extensively studied by [Camerer and Ho, 1999],
[Fudenberg and Levine, 1998], and [Laslier et al., 2001],
who all discuss normative paradigms for learning towards
an equilibrium in repeated games. [Bush and Mosteller,
1955] and [Erev and Roth, 1995] provide models for
learning in stochastic games, by making use of reinforcement
learning. This line of research has proved useful not only
for the study of artificial agents, but also for the
understanding of human learning behavior [Erev and Roth, 2014;
      </p>
      <p>Niv, 2009]. Yet the above-mentioned learning paradigms
mainly focus on competitive agents, and treat games and
networks separately.</p>
      <p>In this paper, we study the question of how interaction in
a social network between players of a cooperative game can
possibly influence their learning behavior. We thereby
assume players to act as one grand coalition, trying to maximize
the group utility, i.e., we take players to be group-rational.
Before performing an action in the game, players can
communicate with each other in a social network about the
estimated quality of joint strategies. Although individuals can
only communicate directly with their neighbors, they aim at
cooperation with the entire social network society.</p>
      <p>We start this paper with a classical graph-theoretical model
for learning in social networks in Section 2. Thereafter,
in Section 3 we provide a probabilistic aggregation method
to model the amalgamation of individual opinions after
network communication. In Section 4 we describe the process
of learning in stochastic games, relying on a reinforcement
method. We combine the three formal frameworks in the
novel learning paradigm proposed in Section 5. In Section
6 we discuss the hidden assumptions of this model and we
argue that it contributes to future computer simulations as well
as psychological experiments on human learning.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Social Network Learning</title>
      <p>The so-called Lehrer-Wagner model for weighted preference
aggregation [Lehrer and Wagner, 1981] can be utilized for
modeling opinion forming in a weighted social network.
Here, each agent holds an individual belief about multiple
alternatives. He can update this belief through communication,
taking into account the opinion of his neighbors and a degree
of trust towards his neighbors’ expertise.1</p>
      <p>Formally, let G = (N; S; u) be the game with N =
f1; : : : ; ng players, where S = fs(1); : : : ; s(k)g is the set
of k joint strategies, and u = (u1; : : : ; un) is a tuple of
individual utility functions ui : S ! [0; 1] for each i 2 N .
Before the game starts, each player i holds a probability
distribution over the set of joint strategies, bi : S ! [0; 1]. One
1In fact, the present model is an extension of a classical social
network model [DeGroot, 1974] for communication about single
events only.
could think of these probabilities as subjective degrees of
belief 2, with respect to the optimality of some joint strategy.
The higher the probability bi(s) for a certain strategy profile
s 2 S, the more player i considers it likely that the joint
strategy s is a social optimum, meaning that it entails a maximal
sum of individual payoffs.</p>
      <p>All private opinions can be reflected in a stochastic n
kmatrix B, that holds the subjective probability values bij =
bi(s(j)) on its entries. Each i-th row bi q thus represents the
probability distribution of agent i over the set S. We write B1
to denote the initial belief matrix. Additionally, each agent i
assigns a weight wim 2 [0; 1] to all other group members
m 2 N . The weights can be represented by the stochastic
n n-matrix W . The corresponding social network is thus a
weighted directed graph G = (N; EW ) where N is the set of
nodes (agents), EW is the set of weighted directed edges, and
wim is thus the weight on the directed edge from i to m. We
assume the graph to allow for loops, which represents agents’
self-confidence.</p>
      <p>Each agent can now communicate with neighbors in the
network and update his individual belief by taking a weighted
arithmetic mean of the beliefs of all agents he trusts. When
iterating this process, the updated belief of agent i about
strategy s(j) after t rounds of communication is given by
bitj+1 = Pm2N wimbtmj . On a societal level, belief
updates after t rounds of network communication are the result
of applying the trust matrix W to the belief matrix Bt, i.e.,
Bt+1 = W Bt (or, equivalently, applying the weight matrix
t times to the initial beliefs B1, i.e. Bt+1 = W t B1). Each
round of belief updating can thus be described by means of
the following algorithm.
2: Bt+1 := bitj+1 n k = W
Output: Belief matrix Bt+1
Bt</p>
      <sec id="sec-2-1">
        <title>Algorithm 1 Network Communication at round t</title>
        <sec id="sec-2-1-1">
          <title>Input: Weight matrix W ; belief matrix Bt</title>
          <p>1: for all i 2 N , s(j) 2 S: bitj+1 := Pm2N wimbtmj
3</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Collective Decision-Making</title>
      <p>After network communication and belief updating, the
coalition needs to decide which joint strategy to adopt in the game.
We introduce a probabilistic social choice function (PSCF),
that maps individual probability distributions over a set of
alternatives to a societal probability distribution. Players in a
cooperative game can make use of such a PSCF to aggregate
all individual preferences, in order to determine which joint
strategy is most optimal according to society. As an
illustration, consider a football team briefing before the match starts,
while the players collectively decide on a team strategy.</p>
      <p>A first definition for probabilistic social choice functions
was introduced by [Gibbard, 1977], in which a preference
profile, i.e., a tuple of individual preference orders, is mapped
to a lottery, i.e., a single probability distribution over the set
of alternatives. Gibbard referred to such functions as social
2Not to be confused with the quantitative notion of belief used in
quantitative approaches to Belief Revision Theory.
decision schemes (SDSs). We will introduce a variant of this
notion, that takes as input the stochastic n k-matrix B, as
introduced in the previous section. We write B(n; k) for the
set of all such stochastic n k-matrices. The output of a
probabilistic social choice function is given by a k-ary row vector
~b that represents a single societal probability distribution over
the set of k alternatives. We write B(k) for the set of such
stochastic k-ary row vectors. Formally, a probabilistic social
choice function is thus a function F : B(n; k) ! B(k).</p>
      <p>It is worth noting that the PSCF provides a way of dealing
with the Condorcet paradox (named after Marie Jean Antoine
Nicolas de Caritat, le Marquis de Condorcet, 1743–1794),
since it will always select a winning candidate on the basis
of probabilities. Even if society likes all alternatives equally
good (represented by equal probabilities for all candidates), a
winning alternative will be chosen at random.</p>
      <p>We will discuss two probabilistic social choice functions
that differ with respect to the weights that individuals receive.
Intuitively, the higher the weight, the more influence an agent
can exert on the construction of the societal probability
distribution. In a weighted PSCF, different individuals can receive
different weights; in an averaged PSCF all individuals receive
equal weights.
3.1</p>
      <p>Weighted Preference Aggregation
In [Lehrer and Wagner, 1981] it has been shown that a special
kind of a social welfare function, the so-called “Allocation
Amalgamation Method” (AAM), can be used as a weighted
preference aggregation method to provide a societal
ordering over a set of k alternatives. The main advantage of such a
weighted method for preference aggregation is that it can take
into account the expertise of specific group members. We will
use this method for constructing a weighted probabilistic
social choice function (wPSCF), that outputs a societal
probability distribution rather than a societal ordering over the set
of alternatives.</p>
      <p>The determination of the weights assigned to individuals,
relies on the assumption that all agents agree on how much
weight each group member should receive. In terms of the
weight matrix W , this agreement on weights corresponds to
every row of the matrix being the same. It therefore
suffices to represent the weights in a stochastic row vector w~ =
(w1; : : : ; wn), in which wi 2 [0; 1] represents the weight that
agent i receives from society. A weighted probabilistic social
choice function (wPSCF) is then a PSCF F : B(n; k) ! B(k)
given by F (B) = w~ B = ~b = (b1; : : : ; bk) so that each
bj = P
from the i2inNdiwviidbuija.l pInrowbaobridlist,yavwalPuSesCFto isa
twhueisgahtemdapapriitnhgmetic mean of these values, for each alternative s(j) 2 S.</p>
      <p>One can easily check that a wPSCF satisfies several
axiomatic properties from social choice theory, among which
independence of alternatives (IA), unanimity (U), neutrality
(N), and anonymity (A).</p>
      <p>In fact, one can check that wPSCFs even satisfy a stronger
notion of unanimity called Pareto optimality (P), which
guarantees that strict unanimous agreement among all individuals
about the order of alternatives is reflected in the societal
preference. wPSCFs also satisfy social rationality (SR) which is
a highly desired property of cooperative games with
grouprational players and a common goal [Arrow, 1951].
3.2</p>
      <p>Averaged Preference Aggregation
Before [Lehrer and Wagner, 1981] introduced their weighted
method for allocation amalgamation, [Intriligator, 1973] and
[Nitzan, 1975] proposed a different probabilistic aggregation
procedure which they call “the average rule”. The average
rule can also be seen as a social welfare function that, on
the input of the individual probability distributions, outputs
a societal ordering over the set of alternatives. We will use
this method for constructing an averaged probabilistic social
choice function (aPSCF), that outputs a societal probability
distribution rather than a ordering. Formally, an aPSCF is a
PCSF F : B(n; k) ! B(k) given by F (B) = (b1; : : : ; bk) =
~b so that each bj = n1 Pi2N bij . The process of belief
aggregation and collective decision making at some round t, can
thus be given by the following algorithm.</p>
      <sec id="sec-3-1">
        <title>Algorithm 2 Belief Aggregation at round t</title>
        <sec id="sec-3-1-1">
          <title>Input: Belief matrix Bt</title>
          <p>1: for all s(j) 2 S: btj := n1 Pi2N bitj
2: ~bt := (bt1; : : : ; btk)
Output: Societal belief vector ~bt
Since an aPSCF does not rely on weights, it can be used as
preference aggregation methods as long as an agreement on
weights is not reached yet. Note that an aPSCF can actually
be thought of as a special case of a wPSCF where the weight
vector is given by w~ = ( n1 ; : : : ; n1 ). Therefore, an aPSCF
satisfies all properties that the more general wPSCFs satisfy. In
fact, the averaged method satisfies non-dictatorship (no
single individual can always determine the social probabilities)
and consistency (equal societal preferences of two disjoint
groups are maintained when the two are merged).</p>
          <p>Finally, let us consider strategy-proofness (SP), which
indicates whether individuals can manipulate the outcome of the
aggregation procedure when submitting an untruthful
individual preference. It easy to verify that neither an aPSCF, nor a
wPSCF satisfies SP. As we assume all players in the game to
be group-rational, no player will manipulate the game with
the purpose of increasing his private payoff only, so we do
not worry about dishonest players trying to sabotage the
cooperation. However, since the utility functions of the players
are assumed to be unknown, they are not certain about the
social optimum either. Hence, a manipulation by a very
‘uninformed’ player can be harmful for the entire coalition.3
4</p>
          <p>Gameplay and Reinforcement Learning
Recall that the first belief update was performed after network
communication, before the game starts. A second belief
update is performed after playing the game. Whereas the first
3In Section 5, we will therefore introduce some conditions on the
amount of influence of different individuals in the coalition, ensuring
that such players will not have enough power for manipulation.
wPSCF
aPSCF</p>
          <p>IA
X
X</p>
          <p>U
X
X</p>
          <p>N
X
X</p>
          <p>A
X
X</p>
          <p>P
X
X</p>
          <p>SR
X
X</p>
          <p>ND
X</p>
          <p>C
X</p>
          <p>SP
update involves individual learning by way of
communication, the second update involves collective learning by way of
reinforcement. In a classical method for reinforcement
learning [Bush and Mosteller, 1955] the probability for a certain
strategy is updated with a weighted average of the previous
probability and the maximum attainable probability 1. More
specifically, suppose player i chooses the individual strategy
sit at round t and he receives a payoff of ui(st), where st
denotes the joint strategy played at round t. Then the probability
for playing si again, is increased by adding some fraction of
the distance between the original probability for si and the
maximum probability 1. This fraction is given by the product
of the payoff and some learning parameter . The payoffs
as well as the constant fraction are scaled to lie in the
interval from 0 to 1. The size of correlates with the speed
of learning [Skyrms, 2010]. The probabilities for all
strategies that are not played in the previous rounds, are decreased
proportionally.</p>
          <p>Formally, let mit : Si ! [0; 1] be the mixed strategy for
player i at round t. Then, after playing sit at round t, player
i can revise his mixed strategy for the next round t + 1 as
follows:
mt+1(si) =
i
mit(si) +
mit(si)
ui(st)(1
ui(st)(mit(si))
mit(si)) if si = st</p>
          <p>i
if si 6= sit
Note that this reinforcement method is aimed to model how
players can individually learn to improve their private strategy
in repeated games. Let us extend this learning behavior to
a group level, by allowing the coalition for reinforcing the
joint strategies that yield a positive social welfare, i.e., sum
of individual payoffs. In that way, players are collectively
learning towards the social optimum.</p>
          <p>More specifically, after belief aggregation by the aPSCF,
the coalition holds a societal probability distribution over the
set of joint strategies, which (for a round t) is presented by
the stochastic vector ~bt. A joint strategy s(j) 2 S is now
chosen to be played with probability btj . Subsequently,
players get to know the corresponding social welfare and
calculate an average fraction U (st) := n1 P
of the individual payoffs, this fraction is iu2sNedubi(yset)a.chInpslateyaedr
for reinforcement, towards the joint strategy with a maximal
social welfare. Although players perform this second belief
update individually, as they are all reinforcing the same joint
strategy with the same reinforcement factor, they are
collectively learning towards the social optimum. Note that each
player is actually learning which individual role to adopt in
the team, i.e., which actions of his are consistent with the
social optimum. This process of gameplay and reinforcement at
a certain round t can be described by the following algorithm.
2: U (st) := n1 Pi2N ui(st)
3: for all i 2 N , s(j) 2 S:
Algorithm 3 Gameplay and Reinforcement at round t
Input: Societal belief vector ~bt; Belief matrix Bt
1: st := s(j), s.t. s(j) is drawn from S with probability btj
bitj+1 :=
bitj +
t
bij</p>
          <p>U (st)(1
U (st)bitj
bitj ) if s(j) = st
if s(j) 6= st
4: Bt+1 := (bitj+1)n k</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>Output: Probability matrix Bt+1</title>
          <p>There are two main reasons for using the Bush-Mosteller
reinforcement method rather than, for instance, the one based
on Polya´ urns [Erev and Roth, 1995]. Firstly, Bush-Mosteller
reinforcement makes use of utility values that are scaled in the
interval from 0 to 1. This guarantees that the utilities are in the
same scale for all players, thus avoiding unequal influences
of different players. Moreover, it ensures that the same unit
is used for payoffs as for individual beliefs about the strategy
profiles. Thus when reinforcing after gameplay, the utility
values are appropriately used as some type of weight in order
to update the beliefs.</p>
          <p>Secondly, the Bush-Mosteller model does not take into
account the accumulated rewards of earlier plays, so that the
proportion of reinforcement does not get smaller over time.
For modeling processes of collective learning rather than
individual learning, it may make more sense to violate this
principle (the so-called Law of Practice). Namely, the learning
process depends on the different communication patterns in
every round, so that it does not slow down.
5</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>The Game-Network Learning Model</title>
      <p>In this section we combine the computational methods
discussed in the previous three sections. The model, which we
will call Game-Network Learning Model, describes an
iterative process that follows the procedures of network
communication, belief aggregation, and gameplay and reinforcement
in each round (Figure 2).
Formally, the Game-Network Learning model thus consists
of a run of Algorithms 1, 2, and 3. We will write Bt+ instead
of Bt+1 to denote the updated beliefs after network
communication in round t. We emphasize that these are not yet the
final updated beliefs at the end of round t, but merely the
intermediate beliefs after communication. During each round
t, the beliefs are thus updated twice, following the scheme:
Bt
!</p>
      <p>Bt+
!</p>
      <p>Bt+1
Algorithm 1 thus outputs the matrix Bt+ after network
communication in round t, which will subsequently be used as
input for Algorithms 2 and 3.
5.1</p>
      <p>Network Structure and Learning Outcome
Can adding network communication to a cooperative game
be beneficial for the learning outcome? And if so, under what
circumstances? Let us start with showing that network
communication only influences the learning behavior of players
as long as an agreement of beliefs is not reached yet.
Formally, we say a group N 0 N is in agreement at round t if
for all i1; i2 2 N 0 it holds that bit1 q = bit2 q, i.e., if agents in N 0
hold the same probability distributions over S. Also, a group
of nodes in a weighted directed graph is said to be closed if
there are no outgoing edges from nodes inside the group to
nodes outside the group. One could imagine that once
everyone in a closed group in the social network has the same
beliefs about the social optimum of the game, then agents of
that group do no longer need to convince each other of their
different opinions.</p>
      <p>Proposition 1. Let N 0 N be a closed group of agents in
the network. Once N 0 is in agreement at the beginning of
some round t in the Game-Network Learning Model, network
communication in that round leaves the beliefs of all agents
in N 0 unchanged, i.e., bit+q = bit q for all i 2 N 0.</p>
      <p>It follows immediately that for all agents i1; i2 2 N 0 we find
bit1+ q = bit1 q = bit2 q = bit2+ q so that they will still be in agreement
after network communication. Since the entire social network
is always closed, once all agents agree, communication is no
longer needed for an individual to learn towards the social
optimum.</p>
      <p>In order to show a positive influence of network
communication on the learning process, we measure the quality of our
Game-Network Learning Model by the probability for
playing the social optimum at a given round (before agreement).
More specifically, if s(j ) 2 S is the social optimum, we
say learning with network communication in round t is better
than learning without network communication if btj+ &gt; btj ,
where btj+ = n1 Pi2N bitj+ denotes the probability that the
social optimum will be played in round t after network
communication and btj = n1 P t denotes the probability
that the social optimum will bi2eNplbaiyjed at round t without (or:
before) network communication.</p>
      <p>Intuitively, one could imagine that if there exist experts in
the network, who are very close to knowing what the social
optimum is, and these experts receive a sufficient amount of
trust from all other players in the network, they can convince
other players to increase the probability values for the social
optimum. Formally, let s(j ) be the social optimum and let
btj be the probability that society assigns to some s(j) at the
beginning of round t. We say an agent ie 2 N in the network
is an expert for round t if bitej &gt; btj . We write E t for the
set of experts for round t. We call the agents i 2 N nE t
nonexperts. Note that it follows directly that for all experts ie 2
E t and all non-experts i 2 N nE t it holds that bitej &gt; bitj .</p>
      <p>Intuitively, the experts for a certain round are the agents
that have in the beginning of that round (and thus at the end of
the previous round) a higher than average degree of belief for
the social optimum. Note that experts can only exist as long
as a total agreement is not reached. Namely, if an agreement
of beliefs is reached between all agents in the network, every
agent has the same degree of belief that is then trivially equal
to the average degree of belief. The notion of an expert is
therefore a relative rather than an objective one: an agent is
only an expert when he has sufficient expertise relative to the
expertise of others in the society.</p>
      <p>Among the set of experts, there always exists a subset of
experts who have the highest degrees of belief for the
social optimum, compared to all other agents in society. These
experts can be considered as maximal experts. Formally, if
s(j ) is the social optimum, then we define the set of
maximal experts for round t as E mtax = fim 2 E t j bitmj =
arg maxi2N bitj g E t. Note that it follows directly from
t
this definition that for all maximal experts im 2 Emax and all
other agents i 2 N nE mtax it holds that bitmj &gt; bitj .</p>
      <p>Whether or not experts can exert more influence on the
outcome of network communication than others, depends on
their position in the network and the weight that other
individuals assign to the opinions of the experts. To analyze
this issue, we look at one’s centrality [Jackson, 2008]. We
introduce the notion of weight centrality to express the
centrality of agents in weighted directed graphs. Formally, let
wi = Pm2N wmi be the total weight that agent i receives
from his neighbors. The weight centrality of some agent
i 2 N is given by the fraction Cw = wi=n. In words, weight
i
centrality is a fraction of the highest possible weight that an
agent i can receive, which is n (namely in case all agents
would assign i a weight of 1, including agent i himself). The
higher the weight centrality of experts, the more influence
experts have on the outcome, and hence the higher the
probability will be for playing the social optimum after network
communication. The following theorem provides a sufficient
condition for network communication to be better than no
communication.
munication, i.e., btj</p>
      <p>&gt; btj .</p>
      <p>Theorem 1. Let s(j ) 2 S be the social optimum. If
Ciwm &gt; n1 Ciw for all im 2 E mtax and i 2 N nE mtax, then
the probability for playing the social optimum at round t after
network communication is higher than before network
com+
The theorem shows that if maximal experts for round t are
trusted more than others, network communication can be
beneficial for finding the social optimum. The proof relies on a
decomposition of the individual weights into wim = 1+ for
all maximal experts and into wi = 1 for all other agents
(with ; non-negative fractions). One can then show that
the increase of the societal probability for the social optimum,
due to communication with trusted experts, is greater than the
respective decrease due to communication with less trusted
non-experts. This guarantees that the probability for playing
the social optimum at some round t after network
communication is higher than before network communication.</p>
      <p>If communication can be beneficial for a single round, we
should ask if it can also be favorable in every round. We
therefore introduce the notion of stable experts, which are
the agents i 2 N for which it holds i 2 E t for every round
t 1. The following theorem states a sufficient condition for
an initial expert (i.e., expert for round 1) to be a stable expert.
In fact, the theorem states an even stronger result that ensures
initial experts always to be in the set of maximal experts.
Theorem 2. Let s(j ) 2 S be the social optimum and let
aEg1rebeemthenetsaett roofuinndit1ia,ltheexnpeErt1s. IfE Emt1axisfocrloasleldt and1,Ea1s liosnign
as a total agreement in the network is not reached.
Hence, if initial experts only assign positive weights to
themselves or other initial experts with the same degrees of belief,
then they will always be in the set of maximal experts for
each round t 1. The proof relies on Proposition 1 and the
intuition that agents with maximal degrees of belief for the
social optimum after network communication, are the agents
with maximal degrees of belief for the social optimum after
gameplay and reinforcement learning too.</p>
      <p>In fact, the group of maximal experts might become bigger
than the group of initial experts, but ‘new’ maximal experts
can only have a degree of belief for the social optimum that is
at most as high as the respective degree of belief of the initial
experts. Under certain conditions of E 1, it even holds that
the group of initial experts is exactly the group of maximal
experts for each round t 1, i.e., E 1 = E mtax.</p>
      <p>Proposition 2. Let s(j ) 2 S be the social optimum and
let E 1 be the set of initial experts for round t = 1. If E 1
is maximally closed and E 1 is in agreement at round t = 1,
then for all t 1 it holds that E 1 = E mtax, as long as a total
agreement in the network is not reached.</p>
      <p>Here we call a group M N maximally closed if all agents
outside of M are connected to at least one other agent outside
of M . Intuitively, if the group of initial experts is maximally
closed, then it means that for all agents i 2 N nE 1 there must
exist another agent j 2 N nE 1 so that wij &gt; 0. This
guarantees that agents outside of E 1 will always have a smaller
degree of belief than agents inside E 1 at every round t 1,
since the updated beliefs are weighted arithmetic means of
beliefs of neighbors.</p>
      <p>Corollary 1. Let s(j ) 2 S be the social optimum and let E 1
be the set of initial experts. If (i) E 1 is maximally closed; (ii)
E 1 is in agreement at round t = 1; and (iii) Ciwe &gt; n1 Ciw
for each ie 2 E 1 and i 2 N nE 1, then btj+ &gt; btj at each
round t 1, as long as at total agreement in the network is
not reached.</p>
      <p>In words, under the stated conditions for initial experts, the
probability for playing the social optimum is in each round
higher after network communication than before (or without)
network communication. This corollary thus provides a
sufficient condition for learning with network communication to
be better in the long run than learning without network
communication. The proof follows immediately from Theorem 1
and Proposition 2. Namely, from the assumptions (i) and (ii)
it follows that the initial group of experts is always the group
of maximal experts, i.e., E 1 = E mtax E t for all rounds
t 1. Now if this group of maximal experts satisfies the
stated condition for weight centrality (iii), it follows from
Theorem 1 that the probability for playing the social optimum
after network communication is higher than without
communication in every round.
6</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusions and Outlook</title>
      <p>In this paper we have studied the possibility of formal
modeling the process of learning in game scenarios, among agents
arranged in a social network. We proposed an iterative model
of learning that follows the procedures of network
communication, belief aggregation, and gameplay and reinforcement
learning. We conclude that interaction in specific social
networks can positively influence the learning behavior of
players in a cooperative game. Learning with network
communication can be better than learning without communication,
when there exist players in the game who know better than
average which joint strategy corresponds to the social
optimum. If these players are sufficiently trusted by society, and
players with little knowledge about the social optimum are
considered less authorial, then the knowledgeable players can
convince other players towards the social optimum.</p>
      <p>We envision possible extensions of the model in the
domain of competitive games. For example, one could make
use of cooperative game theory to describe transferable
utility (TU) games, in which several coalitions can play against
each other. Our model could be extended to a setting in
which different social networks, representing different
coalitions, play a competitive game. Additionally, in our model
we assume players not to know their payoff functions, and
not to be aware of the entire network structure. By
utilizing epistemic models, one could elaborate on these
different notions of (restricted) knowledge by means of
Dynamic Epistemic Logic. A reasonable amount of work on
logics for social networks already has been carried out by
(among others) [Christoff and Hansen, 2014; Liu et al., 2014;
Seligman et al., 2013], and [Zollman, 2012], although not in
the context of cooperative games.</p>
      <p>Each component of our Game-Network Learning model
is inspired by existing computational approaches. Varying
the choices made for modeling each of them, would result in
new learning paradigms. For example, instead of relying on
Lehrer’s and Wagner’s model, one could use Bayesian
learning, see [Bala and Goyal, 1998]. Moreover, instead of
assuming static networks, one could explore a more dynamic
setting in which agents arrive over time and are allowed to
change their weights of trust. As for the preference
aggregation procedure, one could adopt one of the strategy-proof
probabilistic social choice functions proposed by [Barbera` et
al., 1998]. For the procedure of gameplay and reinforcement
learning, one could possibly allow for more than one social
optimum, and rely on the selfishness level [Apt and Scha¨fer,
2014] as reinforcement fraction.</p>
      <p>Finally, on the empirical side, we envision computer
simulations in order to check the convergence properties of the
proposed learning model. Also, testing our model by means
of psychological experiments with human participants, would
help shed some light on the use of the proposed learning
paradigm in modeling real human learning in serious games.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>Nina Gierasimczuk’s work on this article was supported by
an Innovational Research Incentives Scheme Veni grant
27520-043, Netherlands Organisation for Scientific Research
(NWO).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [Apt and Scha¨fer, 2014]
          <string-name>
            <given-names>K.R.</given-names>
            <surname>Apt</surname>
          </string-name>
          and
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Scha¨fer. Selfishness Level of Strategic Games</article-title>
          .
          <source>Journal of AI Research</source>
          ,
          <volume>49</volume>
          :
          <fpage>207</fpage>
          -
          <lpage>240</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <source>[Arrow</source>
          , 1951]
          <string-name>
            <given-names>K.</given-names>
            <surname>Arrow</surname>
          </string-name>
          . Social Choice and
          <string-name>
            <given-names>Individual</given-names>
            <surname>Values</surname>
          </string-name>
          . John Wiley and Sons, New York,
          <year>1951</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [Bala and Goyal, 1998]
          <string-name>
            <given-names>V.</given-names>
            <surname>Bala</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Goyal</surname>
          </string-name>
          .
          <article-title>Learning from neighbors</article-title>
          .
          <source>Review of Economic Studies</source>
          ,
          <volume>5</volume>
          :
          <fpage>205</fpage>
          -
          <lpage>228</lpage>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>[Barbera</surname>
          </string-name>
          ` et al.,
          <year>1998</year>
          ]
          <string-name>
            <given-names>S.</given-names>
            <surname>Barbera`</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bogomolnaia</surname>
          </string-name>
          , and H. van der Stel.
          <article-title>Strategy-proof probabilistic rules for expected utility maximizers</article-title>
          .
          <source>Mathematical Social Sciences</source>
          ,
          <volume>35</volume>
          :
          <fpage>89</fpage>
          -
          <lpage>103</lpage>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <source>[Bush and Mosteller</source>
          , 1955]
          <string-name>
            <given-names>R.</given-names>
            <surname>Bush</surname>
          </string-name>
          and
          <string-name>
            <given-names>F.</given-names>
            <surname>Mosteller</surname>
          </string-name>
          .
          <source>Stochastic Models of Learning</source>
          . John Wiley and Sons, New York,
          <year>1955</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <source>[Camerer and Ho</source>
          , 1999]
          <string-name>
            <given-names>C.</given-names>
            <surname>Camerer</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.</given-names>
            <surname>Ho</surname>
          </string-name>
          .
          <article-title>Experienceweighted attraction learning in normal form games</article-title>
          .
          <source>Econemetrica</source>
          ,
          <volume>67</volume>
          (
          <issue>4</issue>
          ):
          <fpage>827</fpage>
          -
          <lpage>874</lpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <source>[Christoff and Hansen</source>
          , 2014]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Christoff</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.U.</given-names>
            <surname>Hansen</surname>
          </string-name>
          .
          <article-title>Dynamic Social Networks Logic</article-title>
          .
          <source>Journal of Applied</source>
          Logic (to appear),
          <source>available as ILLC Prepublication Series Report PP-2014-09</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <source>[DeGroot</source>
          , 1974]
          <string-name>
            <given-names>M.H.</given-names>
            <surname>DeGroot. Reaching</surname>
          </string-name>
          a Consensus. American Statistical Association,
          <volume>69</volume>
          (
          <issue>345</issue>
          ):
          <fpage>118</fpage>
          -
          <lpage>121</lpage>
          ,
          <year>1974</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <source>[Easly and Kleinberg</source>
          , 2010]
          <string-name>
            <given-names>D.</given-names>
            <surname>Easly</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Kleinberg</surname>
          </string-name>
          . Networks, Crowds, and Markets. Cambridge University Press, New York,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <source>[Erev and Roth</source>
          , 1995]
          <string-name>
            <given-names>I.</given-names>
            <surname>Erev</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.E.</given-names>
            <surname>Roth</surname>
          </string-name>
          .
          <article-title>Learning in Extensive-Form Games: Experimental Data and Simple Dynamic Models in the Intermediate Term</article-title>
          .
          <source>Games and Economic Behavior</source>
          ,
          <volume>212</volume>
          :
          <fpage>164</fpage>
          -
          <lpage>212</lpage>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <source>[Erev and Roth</source>
          , 2014]
          <string-name>
            <given-names>I.</given-names>
            <surname>Erev</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.E.</given-names>
            <surname>Roth</surname>
          </string-name>
          .
          <article-title>Maximization, learning, and economic behavior</article-title>
          .
          <source>Proceedings of the National Academy of Sciences, 111</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <source>[Fudenberg and Levine</source>
          , 1998]
          <string-name>
            <given-names>D.</given-names>
            <surname>Fudenberg</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.K.</given-names>
            <surname>Levine</surname>
          </string-name>
          .
          <article-title>The Theory of Learning in Games</article-title>
          . The MIT Press, Cambridge,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <source>[Gibbard</source>
          ,
          <year>1977</year>
          ]
          <string-name>
            <given-names>A.</given-names>
            <surname>Gibbard</surname>
          </string-name>
          .
          <article-title>Manipulation of schemes that mix voting with chance</article-title>
          .
          <source>Econometrica</source>
          ,
          <volume>45</volume>
          (
          <issue>3</issue>
          ):
          <fpage>665</fpage>
          -
          <lpage>681</lpage>
          ,
          <year>1977</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <source>[Intriligator</source>
          , 1973]
          <string-name>
            <given-names>M.D.</given-names>
            <surname>Intriligator</surname>
          </string-name>
          .
          <article-title>A Probabilistic Model of Social Choice</article-title>
          .
          <source>The Review of Economic Studies</source>
          ,
          <volume>40</volume>
          (
          <issue>4</issue>
          ):
          <fpage>553</fpage>
          -
          <lpage>560</lpage>
          ,
          <year>1973</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [
          <string-name>
            <surname>Jackson</surname>
          </string-name>
          ,
          <year>2008</year>
          ]
          <string-name>
            <given-names>M.O.</given-names>
            <surname>Jackson</surname>
          </string-name>
          .
          <source>Social and Economic Networks</source>
          . Princeton University Press,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [Laslier et al.,
          <year>2001</year>
          ]
          <string-name>
            <given-names>J.</given-names>
            <surname>Laslier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Topol</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B.</given-names>
            <surname>Walliser</surname>
          </string-name>
          .
          <article-title>A Behavioral Learning Process in Games</article-title>
          .
          <source>Games and Economic Behavior</source>
          ,
          <volume>37</volume>
          :
          <fpage>340</fpage>
          -
          <lpage>366</lpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <source>[Lehrer and Wagner</source>
          , 1981]
          <string-name>
            <given-names>K.</given-names>
            <surname>Lehrer</surname>
          </string-name>
          and
          <string-name>
            <given-names>C.</given-names>
            <surname>Wagner</surname>
          </string-name>
          . Rational Consensus in Science and Society. D. Reidel Publishing Company, Dordrecht,
          <year>1981</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [Liu et al.,
          <year>2014</year>
          ]
          <string-name>
            <given-names>F.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Seligman</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Girard</surname>
          </string-name>
          .
          <article-title>Logical dynamics of belief change in the community</article-title>
          .
          <source>Synthese</source>
          ,
          <volume>191</volume>
          :
          <fpage>2403</fpage>
          -
          <lpage>2431</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <source>[Nitzan</source>
          , 1975]
          <string-name>
            <given-names>S.</given-names>
            <surname>Nitzan</surname>
          </string-name>
          .
          <article-title>Social Preference Ordering in a Probabilistic Voting Model</article-title>
          .
          <source>Public Choice</source>
          ,
          <volume>24</volume>
          :
          <fpage>93</fpage>
          -
          <lpage>100</lpage>
          ,
          <year>1975</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <source>[Niv</source>
          , 2009]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Niv</surname>
          </string-name>
          .
          <article-title>Reinforcement learning in the brain</article-title>
          .
          <source>Journal of Mathematical Psychology</source>
          ,
          <volume>53</volume>
          :
          <fpage>139</fpage>
          -
          <lpage>154</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [Seligman et al.,
          <year>2013</year>
          ]
          <string-name>
            <given-names>J.</given-names>
            <surname>Seligman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Liu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Girard</surname>
          </string-name>
          .
          <article-title>Facebook and the epistemic logic of friendship</article-title>
          .
          <source>In Proceedings of Theoretical Aspects of Rationality and Knowledge</source>
          , Chennai, India,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <source>[Skyrms</source>
          , 2010]
          <string-name>
            <given-names>B.</given-names>
            <surname>Skyrms</surname>
          </string-name>
          . Signals: Evolution, Learning &amp; Information. Oxford University Press, Oxford,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <source>[Zollman</source>
          , 2012]
          <string-name>
            <given-names>K.J.S.</given-names>
            <surname>Zollman</surname>
          </string-name>
          .
          <article-title>Social network structure and the achievement of consensus</article-title>
          . Politics, Philosophy &amp; Economics,
          <volume>11</volume>
          (
          <issue>1</issue>
          ):
          <fpage>26</fpage>
          -
          <lpage>44</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>