I see what you see: Integrating eye tracking into Hanabi playing agents

                        Eva Tallula Gottwald                         Markus Eger and Chris Martens
                           egottwald@mills.edu                         meger@ncsu.edu, crmarten@ncsu.edu
                                                                       Principles of Expressive Machines Lab
                               Mills College                                    NC State University
                             Oakland, CA, USA                                    Raleigh, NC, USA


                           Abstract                                from their hand and putting it on the table. If the played card
                                                                   is the next card in numerical order of its corresponding color,
  Humans’ eye movements convey a lot of information about          e.g. if a blue 4 was played and the highest blue card currently
  their intentions, often unconsciously. Intelligent agents that
  cooperate with humans in various domains can benefit from
                                                                   on the table is a 3, the card is added to the board, otherwise
  interpreting this information. This paper contains a prelimi-    it is discarded and a mistake is noted. When there is no card
  nary look at how eye tracking could be useful for agents that    of a particular color on the board, a 1 is considered to be the
  play the cooperative card game Hanabi with human players.        next card in numerical order. Players may also opt to out-
  We outline several situations in which an AI agent can utilize   right discard a card instead of playing it; this recovers one
  gaze information, and present an outlook on how we plan to       hint token. After players play or discard a card they draw a
  integrate this with reimplementations of contemporary Han-       new card from the deck to restock their hand. The game ends
  abi agents.                                                      once the players collectively have made 3 mistakes, or when
                                                                   the deck has been exhausted, plus one extra round. The score
                                                                   of the players equals the number of cards on the board, for
                       Introduction                                a maximum of 25 points if all five cards in each of the five
Humans often give non-verbal cues to indicate their inten-         colors have been played successfully.
tions (Land and Hayhoe 2001) or augment their verbal com-              Even though the game provides the players with very lim-
munication, often subconsciously. It would therefore be ben-       ited communication, when human players play the game,
eficial for the usability of computational systems to be able      they typically follow the same strategy as in normal con-
to interpret such signals. However, the subtle, subconscious       versation, using Grice’s maxims of communication (Grice
use of signaling and lack of simple test domains make inter-       1975):
preting these signals very challenging. For many other AI
techniques, games have served as a test environment, be-           • The maxim of quantity by giving necessary hints, but not
cause they provide a low-risk, high-fidelity environment and         more
often have a clear performance metric that can be used to          • The maxim of quality is enforced by the rules (players
measure success. We propose that games involving commu-              may not lie)
nication can be used as test environments for the interpreta-
tion of non-verbal cues given by humans.                           • The maxim of relation by not giving hints that are not
   One example for a game that relies heavily on inter-player        relevant to the current state of the game
communication is Hanabi (Bauza 2010), a cooperative card           • The maxim of manner by trying to avoid hints that could
game in which players collaborate to build fireworks rep-            be misinterpreted
resented by cards with ranks from 1 to 5 in five colors. Un-
                                                                   However, when games are closely observed, players also of-
like in traditional card games, players hold their cards facing
                                                                   ten provide clues about their behavior in ways that are not
away from them, i.e. every player sees every other players’
                                                                   strictly part of game play, such as hesitation, visibly decid-
card, but not their own. On a player’s turn, they may give
                                                                   ing between two players to give hints to, etc. While there
a hint to another player about the contents of that player’s
                                                                   has been significant research into Hanabi game play, includ-
hand. These hints are limited to either telling the other player
                                                                   ing how to build agents that play the game well with human
which of their cards have a particular color, or a particular
                                                                   players, the interpretation of non-verbal communication dur-
rank. For example, player A may tell player B which of their
                                                                   ing game play has been understudied.
cards are red and which are not, but not a subset thereof. Giv-
ing a hint expends a hint token, of which the players initially       In this paper we present preliminary work that aims to
collectively have eight. Instead of giving a hint on their turn,   integrate eye-tracking into agents that play Hanabi with hu-
players may also opt to play a card by choosing any card           man players. We have implemented a 2-player version of
                                                                   Hanabi in Unity that integrates with a Tobii eye tracker. We
                                                                   will present our hypotheses of how eye tracking information
                                                                   could be utilized by AI agents, the eye tracking information
we have available, and some initial observations about play-      to interpret it correctly. The work cited above does that by
ers’ gaze behavior.                                               either assuming that the player follows a fixed protocol, or
                                                                  by explicitly or implicitly enumerating all possible current
                      Related work                                game states given what is known about the hidden informa-
                                                                  tion, and determining in which game state a player would
Hanabi has been of interest for several AI researchers be-        give the hint they gave. However, as mentioned, humans of-
cause of its cooperative nature, the hidden information and       ten indicate their intentions with their gaze. We therefore
limited communication channels. One approach to the game          postulate that an AI agent with access to eye tracking infor-
is to purely optimize the score the agents obtain. Cox et al.     mation can perform better than one without.
(2015) have devised a logical/mathematical protocol to con-          Consider, for example, the case where a hint can be inter-
vey a large amount of information using the limited commu-        preted to indicate either a playable card, or a card that should
nication Hanabi allows, scoring close to a perfect score in       be discarded. By using the player’s gaze, the AI agent might
most games. While this approach only works in games with          be able to disambiguate between the two options. If the card
5 players, Bouzy (2017) presents an improved version that         should be played, it is more likely that the player looked at
also works with fewer players. Walton-Rivers et al. (2017)        the board where that card should go, whereas a card that
present a comparison of several different approaches, in-         should be discarded might prompt the player to look at the
cluding several based on Monte Carlo Tree Search (Browne          other discarded cards more.
et al. 2012), focusing on how they perform in simulated              What we are interested in is going beyond this very basic
games. However, while agents using the techniques dis-            example, and look at more complex cases. Hypothetically
cussed by these authors obtain very high scores when play-        interesting scenarios include:
ing with each other, the protocols they use are very hard to
follow for humans, and certainly not what a human player          • The player’s gaze going back and forth between two cards
would intuitively expect.                                            in the AI agent’s hand before giving a hint including one
   Another approach for building Hanabi playing agents is            of them. Depending on what the AI agent already knows
more in line with how human players approach the game.               about the two cards, they may be able to infer additional
Van den Bergh et al. (2015) present several agents using sim-        information. For example, since there is only one copy of
ple if-then rules defined by experts. Osawa (2015) describes         each 5, players often give hints to prevent them from be-
agents that follow an expert-informed protocol, while also           ing discarded, especially when they think that the person
deducing how information obtained from the other players             holding the 5 is likely to discard it. However, this is at ten-
should be interpreted by having a model of possible inter-           sion with giving hints that have a more immediate effect
pretations. Note that Walton-Rivers et al. also included sev-        on game play. An AI agent could deduce this tension by
eral rule-based approaches, including Osawa’s and van der            observing which options a player is considering.
Bergh’s, in their comparison, some of which did not perform       • Because players know how many copies of each card are
much worse than their Monte Carlo Tree Search variants.              in the deck, counting cards that were discarded, played
Eger et al. (2017) specifically investigated how AI agents           or are in the other players’ hands can be used to narrow
interact with human players, noting that agents that exhibit         down which cards are in a player’s own hand. By tracking
intentional behavior score higher when playing with a hu-            which cards a player looks at before performing a play
man player than those simply following their own protocol.           or discard action, it is possible to determine which possi-
   It has been noted that humans use the gaze of other peo-          bilities they are considering. For example, consider that a
ple to determine their intentions, feelings, etc. starting from      player knows that they have a 4, but not which color it is.
as young as 3-4 years (Baron-Cohen 1997). In order to make           When they look at all discarded 4s and a particular card in
the interaction with computers more natural, it is therefore of      the AI agent’s hand before playing their own 4, it is pos-
great interest to research the integration of gaze into human-       sible to deduce that the card in the AI agent’s hand might
computer interaction (Poole and Ball 2006). Bader and Bey-           also be a 4. In particular, if the color of the player’s 4 was
erer (2011) report how user’s mental models change their             ambiguous, the AI agent might infer that their card is a 4
gaze behavior to be more forward-looking to indicate their           of a color that would help disambiguate the color of the
intentions as they become more familiar with a task. Hris-           player’s 4.
tova and Grinberg (2005) showed that players that are more        • When the AI agent draws a new card, the duration of the
likely to cooperate in an Iterated Prisoners’ Dilemma sce-           gaze of the human player can be used to determine how
nario are also more likely to look at the payouts, while play-       immediately useful a card is likely to be. This is partic-
ers less likely to cooperate looked more at the computer’s           ularly true if the players are waiting to draw a specific
moves. While this indicates that player behavior can be pre-         card, such as a missing 1, or if a card that can only be
dicted from their gaze, the game under consideration was             played later in the game, such as a 4 is drawn early. We
very simple. In the next section we will explore how eye-            believe that players’ gaze will linger shorter on cards that
tracking could be used in a more complex domain.                     are not immediately useful. However, if a card is useful,
                                                                     the player has to scan the other cards in the AI agent’s
              Eye Tracking for Hanabi                                hand to determine which hint to give to unambiguously
In Hanabi, when receiving a hint from another player, it is          indicate the usefulness of the new card.
essential to determine the intention behind that hint, in order      To be able to integrate these scenarios into an agent that
  (a) A screenshot of our Hanabi implementation during game play (b) A heatmap of player gaze behavior overlayed over the game screen

                                        Figure 1: Unity implementation of Hanabi with eye tracking


plays Hanabi, we need to be able to track the player’s gaze             niques to reduce the noise, starting with a simple low-pass
(to determine what they are focusing on), including which of            filter. However, even with this noisy data, one can already
multiple options it changes between (to determine decision              see that players focus on particular cards more than others.
making), and how long it rests in a particular spot (to deter-          Another, not entirely unexpected, observation we have made
mine interest/ disinterest in a particular option). Addition-           is that a player’s gaze is drawn towards UI elements that
ally, because of the inherent uncertainty of the information            move or pop up, such as when they are given a hint, or when
obtained the agent must not take this information as a fact,            they click on a pop-up menu.
but rather only use it as guidance to help determine player                 In our current version of the game, the AI agent performs
intentions.                                                             its moves randomly, with our main focus being obtaining
   Existing Hanabi agents interpret hints that they are given           and interpreting eye tracking data. In the next section we
by determining in which situation, or in service of which               will discuss how we plan on incorporating this information
goal, the other player would give such a hint. If the agent             into the agent’s decisions.
determines that there are multiple applicable situations, it
needs to break this tie in some way. The conservative ap-                                      Future Work
proach would be to refrain from choosing any particular sit-
uation and continuing game play with the information ob-                So far, our efforts have been focused on creating an imple-
tained, as is done by Osawa’ (2015) Outer State Player.                 mentation of Hanabi that incorporates eye tracking. For fu-
Alternatively, in the approach used by Eger et al. (2017),              ture work, we want to explore how players’ gaze behavior
ambiguities are resolved by assuming that players prefer ac-            lines up with the situations outlined above, and test the hy-
tionable hints that advance the game. Eye tracking data can             pothesis that player’s gaze can be used by an agent to im-
be used in addition to these options to provide additional              prove its behavior. While we have already identified that
weight to each possible situation, without being the sole de-           players tend to look at certain UI elements when they be-
ciding factor. This is particularly appealing because it would          come interactable, we have yet to determine how closely a
allow our approach to be integrated in multiple existing and            player’s gaze correlates with their intentions, and in what
new agent designs.                                                      way.
                                                                           However, the main advantage of having eye tracking in-
                                                                        formation available is not that it is necessarily an accurate in-
                        Implementation                                  dicator of a player’s intention, but rather that it can be used
In order to test our hypotheses about how to integrate gaze             in addition to other techniques. We therefore plan to reim-
into Hanabi agents, we implemented the 2 player version of              plement Osawa’s (2015) and Eger et al.’s (2017) agents, and
Hanabi in Unity with support for a Tobii eye tracker1 . Using           use eye-tracking for cases in which their approaches have
the eye tracker, we are able to determine where a player’s              to decide between two or more possibilities. To validate this
gaze lingers with reasonable precision to determine which               approach, we plan on performing a user study to compare
card they are focusing on. Figure 1 shows the user interface            the different approaches. Each participant will play several
of our implementation, as well as an example for where a                games with the same agent type, where the agents will ig-
player’s gaze lands on the screen. Note that this data comes            nore the eye tracking information in some games, while for
from a rough development version which does not currently               others it is taken into account. We believe that taking gaze
filter out any noise. We are still in the process of tweaking           into account will allow the AI agents to perform better when
gaze duration thresholds, and considering additional tech-              playing with human player, but the games could also provide
                                                                        insights into how gaze differs between different players, if at
   1
       https://developer.tobii.com/tobii-unity-sdk/                     all. Additionally, the score in the game is not the only rele-
vant variable. We will therefore also perform a survey to ask      van den Bergh, M.; Spieksma, F.; and Kosters, W. 2015.
participants if they perceived the AI to understand them bet-      Hanabi, a co-operative game of fireworks. Bachelor’s thesis,
ter, or play more rationally.                                      Universiteit Leiden.
   One limitation of this approach, and an ethical experiment      Walton-Rivers, J.; Williams, P. R.; Bartle, R.; Perez-
setup in general, is that the participants are necessarily aware   Liebana, D.; and Lucas, S. M. 2017. Evaluating and mod-
of the eye tracker, and may adapt their behavior. Our exper-       elling hanabi-playing agents. In Evolutionary Computation
imental design therefore only compares games in which eye          (CEC), 2017 IEEE Congress on, 1382–1389. IEEE.
tracking information is present, but ignored by the AI agents,
with games in which it is utilized. By comparing the scores
from games in which gaze information is ignored with prior
work, we can determine whether players also changed their
in-game behavior, though.
   Finally, we are also considering applications beyond
games, such as assisting users of software tools. By deter-
mining user’s intentions, help can be given in a more con-
textual way.

                        References
Bader, T., and Beyerer, J. 2011. Influence of users men-
tal model on natural gaze behavior during human-computer
interaction. In 2nd Workshop on Eye Gaze in Intelligent Hu-
man Machine Interaction, 25–32.
Baron-Cohen, S. 1997. Mindblindness: An essay on autism
and theory of mind. MIT press.
Bauza, A. 2010. Hanabi.
Bouzy, B. 2017. Playing hanabi near-optimally. In Advances
in Computer Games, 51–62. Springer.
Browne, C. B.; Powley, E.; Whitehouse, D.; Lucas, S. M.;
Cowling, P. I.; Rohlfshagen, P.; Tavener, S.; Perez, D.;
Samothrakis, S.; and Colton, S. 2012. A survey of monte
carlo tree search methods. IEEE Transactions on Computa-
tional Intelligence and AI in games 4(1):1–43.
Cox, C.; De Silva, J.; Deorsey, P.; Kenter, F. H.; Retter, T.;
and Tobin, J. 2015. How to make the perfect fireworks
display: Two strategies for hanabi. Mathematics Magazine
88(5):323–336.
Eger, M.; Martens, C.; and Alfaro Córdoba, M. 2017. An
intentional ai for hanabi. In Computational Intelligence and
Games (CIG), 2017 IEEE Conference on, 68–75. IEEE.
Grice, H. P. 1975. Logic and conversation. Syntax and
semantics 41–58.
Hristova, E., and Grinberg, M. 2005. Information acquisi-
tion in the iterated prisoners dilemma game: An eye-tracking
study. In Proceedings of the 27th annual conference of
the cognitive science society, 983–988. Lawrence Erlbaum
Hillsdale, NJ.
Land, M. F., and Hayhoe, M. 2001. In what ways do
eye movements contribute to everyday activities? Vision re-
search 41(25-26):3559–3565.
Osawa, H. 2015. Solving Hanabi: Estimating hands by op-
ponent’s actions in cooperative game with incomplete infor-
mation. In Workshops at the Twenty-Ninth AAAI Conference
on Artificial Intelligence.
Poole, A., and Ball, L. J. 2006. Eye tracking in hci and
usability research. In Encyclopedia of human computer in-
teraction. IGI Global. 211–219.