<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>The e ectiveness of higher-order theory of mind in negotiations</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Harmen de Weerd</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rineke Verbrugge</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bart Verheij</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>CodeX, Stanford University</institution>
          ,
          <country country="US">United States</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Institute of Arti cial Intelligence, University of Groningen</institution>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>When the outcome of a decision you make depends on the actions of others, it is important to be able to predict those actions. To facilitate this process, people reason about unobservable mental content of others, such as beliefs, desires, and intentions. People can also use this so-called theory of mind recursively, and reason about the way others make use of theory of mind. For example, to understand a sentence such as `Alice believes that Bob knows that Carol is throwing him a surprise party', the reader has to use second-order theory of mind, by reasoning about the way Alice reasons about Bob's knowledge. Behavioral experiments have demonstrated that people make use of higherorder (i.e. at least second-order) theory of mind [1, 2]. However, the extent to which non-human species are able to use theory of mind of any kind is under debate [3, 4]. The human ability to make use of higher-order theory of mind suggests that there may be settings in which this ability provides individuals with enough of an evolutionary advantage to support the emergence of reasoning about the minds of others, and even to use this ability recursively. One possible explanation is that higher-order theory of mind is needed to engage e ectively in mixed-motive interactions [5] such as negotiation. Mixed-motive interactions involve partially overlapping goals, so that these interactions are neither fully cooperative nor fully competitive. In this paper, we make use of agent-based computational models to determine whether the use of higher orders of theory of mind allows agents to reach better outcomes in negotiation, both in terms of individual agent performance as well as in terms of social welfare.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        We study the e ect of higher-order theory of mind in a particular negotiation
game known as Colored Trails, a test-bed introduced by Barbara Grosz, Sarit
Kraus and colleagues to investigate various aspects of negotiations [
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ]1. In our
setup, the game is played by three players on a square board consisting of 25
colored tiles, as depicted in Figure 1. The three players, i, j, and r, are initially
located at starting location S and want to end up as close as possible to their
1 Also see https://coloredtrails.atlassian.net/wiki/display/coloredtrailshome/.
own goal location, li, lj , and lr respectively. Each player also receives a set of
four colored chips (depicted as small circles in Figure 1), selected randomly from
the same ve possible colors as those on the board. These chips are used to
move around on the board. Players may move to a tile adjacent to their current
location by handing in a chip of the same color as the destination tile. For
example, a player could move from starting tile S in Figure 1 to location lr by
handing in one striped chip and two black chips.
      </p>
      <p>A player's score depends on how closely he approaches his goal location. A
player receives 10 points for each step he takes towards his goal. Reaching the
goal location yields an additional 50 points. Finally, any chip that has not been
used to move around the board is worth an additional 5 points to its owner.
Players are thus highly incentivized to reach their goal location, but they also
compete over control of unused chips.</p>
      <p>To get closer to their goals, players are allowed to trade chips. This trading
takes the form of a one-shot bargaining game. Two agents i and j are assigned
the role of allocator, while the third agent r is assigned the role of responder.
The two allocators simultaneously choose an o er to make to the responder. An
allocator suggests to trade any given subset of his own chips against any given
subset of the responder's chips. The responder then accepts the o er that yields
her the highest score. However, if both allocators have made an o er that would
reduce her score, the responder rejects both o ers and the initial distribution of
chips becomes nal.
3</p>
    </sec>
    <sec id="sec-2">
      <title>Theory of mind</title>
      <p>In our Colored Trails setup, the role of the responder is limited to selecting
the o er that bene ts her the most. We therefore focus on the theory of mind
abilities of the allocators. A zero-order theory of mind (ToM0) allocator is unable
to reason about the goal of his trading partner. Instead, the ToM0 allocator
estimates the probability that his o er will be accepted based on how successful
this o er has been in the past.</p>
      <p>The rst-order theory of mind (ToM1) allocator can use theory of mind to
put himself in the position of other agents and simulate their decision-making
processes. By putting himself in the position of the responder, a ToM1 allocator
understands that the responder will reject any o er that would reduce her score.
Similarly, by placing himself in the position of the competing allocator, a ToM1
allocator can predict what o er his competitor is going to make. The ToM1
allocator can use this information when making an o er himself.</p>
      <p>For increasingly higher orders of theory of mind, a kth-order theory of mind
(ToMk) allocator considers the possibility of increasingly more sophisticated
competitors. However, a ToMk allocator retains the ability to reason at orders
of theory of below the kth. For example, through repeated interaction with the
same competitor, a ToM6 allocator may come to believe that the competing
allocator is a ToM1 agent, so the ToM6 allocator may choose to behave as if he
himself were a ToM2 agent.
4</p>
    </sec>
    <sec id="sec-3">
      <title>Results</title>
      <p>We performed simulations in which the theory of mind agents described in the
previous section played repeated one-shot Colored Trails games. Each new game
was played on a a di erent board in terms of coloring and goal locations and
with di erent sets of initial chips.</p>
      <p>Figure 2 shows the average performance of a focal ToMi allocator in the
presence of a competing ToMj allocator, which is calculated as the average
difference between an agent's score after the end of a negotiation and his initial
score at the start of negotiation. It turns out that even though ToM0 allocators
can learn to negotiate e ectively, ToM1 allocators outperform ToM0 allocators,
irrespective of the theory of mind abilities of the competing allocator. Similarly,
ToM2 allocators outperform ToM1 allocators when the competing allocator uses
theory of mind. We nd no additional bene t for third-order theory of mind.
However, surprisingly, ToM4 allocators outperform lower-order agents when the
competing allocator can use second-order theory of mind.</p>
      <p>
        Figure 3 shows that the presence of ToM1 allocators and ToM2 allocators
also increases social welfare, as measured by the sum of the negotiation scores of
all three agents. Even higher orders of theory of mind were found not to in uence
social welfare any further. Interestingly, even though theory of mind agents act
purely in their own interest, this increase in social welfare is not completely
explained by increase in the score of the allocator; the score of the responder
increases as well. It would be interesting to also investigate alternative notions
of social welfare (see for example [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]).
      </p>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>
        Our results in the Colored Trails game show that there are mixed-motive settings
in which the ability to make use of theory of mind allows individuals to reach
better outcomes. We nd that both rst-order and second-order theory of mind
allows agents to obtain a better score, but also to obtain a better score for
their trading partner. Although we nd no additional advantages for third-order
theory of mind, we nd that fourth-order theory of mind provides agents with a
competitive edge. Interestingly, we did not nd a competitive bene t for
fourthorder theory of mind in strictly competitive settings [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. This suggests that theory
of mind may be more important for dealing with mixed-motive settings than it
is in competitive settings.
      </p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>This work was supported by the Netherlands Organisation for Scienti c Research
(NWO) Vici grant NWO 277-80-001, awarded to Rineke Verbrugge.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Perner</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wimmer</surname>
          </string-name>
          , H.:
          <article-title>\John thinks that Mary thinks that</article-title>
          ...
          <article-title>". Attribution of second-order beliefs by 5 to 10 year old children</article-title>
          .
          <source>Journal of Experimental Child Psychology</source>
          <volume>39</volume>
          (
          <issue>3</issue>
          ) (
          <year>1985</year>
          )
          <volume>437</volume>
          {
          <fpage>71</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Hedden</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
          </string-name>
          , J.:
          <article-title>What do you think I think you think?: Strategic reasoning in matrix games</article-title>
          .
          <source>Cognition</source>
          <volume>85</volume>
          (
          <issue>1</issue>
          ) (
          <year>2002</year>
          )
          <volume>1</volume>
          {
          <fpage>36</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Penn</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Povinelli</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>On the lack of evidence that non-human animals possess anything remotely resembling a `theory of mind'</article-title>
          .
          <source>Philosophical Transactions of the Royal Society B: Biological Sciences</source>
          <volume>362</volume>
          (
          <issue>1480</issue>
          ) (
          <year>2007</year>
          )
          <fpage>731</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Tomasello</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Why we Cooperate</article-title>
          . MIT Press, Cambridge, MA (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Verbrugge</surname>
          </string-name>
          , R.:
          <article-title>Logic and social cognition: The facts matter, and so do computational models</article-title>
          .
          <source>Journal of Philosophical Logic</source>
          <volume>38</volume>
          (
          <year>2009</year>
          )
          <volume>649</volume>
          {
          <fpage>680</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Grosz</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kraus</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Talman</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stossel</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Havlin</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>The in uence of social dependencies on decision-making: Initial investigations with a new game</article-title>
          .
          <source>In: Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems</source>
          . Volume
          <volume>2</volume>
          ., IEEE Computer Society (
          <year>2004</year>
          )
          <volume>782</volume>
          {
          <fpage>789</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Gal</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grosz</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kraus</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pfe</surname>
            <given-names>er</given-names>
          </string-name>
          , A.,
          <string-name>
            <surname>Shieber</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Agent decision-making in open mixed networks</article-title>
          .
          <source>Arti cial Intelligence</source>
          <volume>174</volume>
          (
          <issue>18</issue>
          ) (
          <year>2010</year>
          )
          <volume>1460</volume>
          {
          <fpage>1480</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>d'Aspremont</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gevers</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Social welfare functionals and interpersonal comparability</article-title>
          . In Arrow, K.J.,
          <string-name>
            <surname>Sen</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suzumura</surname>
          </string-name>
          , K., eds.:
          <source>Handbook of Social Choice and Welfare</source>
          . North Holland (
          <year>2002</year>
          )
          <volume>459</volume>
          {
          <fpage>541</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9. de Weerd, H.,
          <string-name>
            <surname>Verbrugge</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verheij</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>How much does it help to know what she knows you know? An agent-based simulation study</article-title>
          .
          <source>Arti cial Intelligence</source>
          <volume>199</volume>
          {
          <fpage>200</fpage>
          (
          <year>2013</year>
          )
          <volume>67</volume>
          {
          <fpage>92</fpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>