<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Using Combinatorial Multi-Armed Bandits to Dynamically Update Player Models in an Experience Managed Environment</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Anton Vinogradov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Brent Harrison</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Kentucky</institution>
          ,
          <addr-line>Lexington, KY 40506</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Designers often treat players as having static play styles, but this is shown to not necessarily be always true. This is not an issue with games that create a relatively static experience for the player but can cause problems for games that attempt to model the player and adapt themselves to better suit the player such as those with Experience Managers (ExpMs). When an ExpM makes changes to the world it necessarily biases the game environment to the better match with what it believes that the player wants. This process limits what sorts of observations that the ExpM can take and leads to problems if and when a player suddenly shifts their own preferences leading to an outdated player model that can be slow to recover. Previously it has been shown that detecting a preference shift is possible and that the Multi-Armed Bandit (MAB) framework can be used to recover the player model, but this method had limits in how much information it could gather about the player. In this paper, we ofer an improvement on recovering a player model after a preference shift after one is detected by using Combinatorial MABs (CMAB). To evaluate these claims we test our method in a text-based game environment on artificial agents and find that CMABs have a significant gain in how well they can recover a model. We also validate that our artificial agents perform similarly to how humans would by testing the task on human subjects.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Experience management is the study of automated systems
that guide players through a more interesting and tailored
experience than that which could normally be achieved.
A game that features an experience manager (ExpM) can
automatically adapt the player’s experience to better serve
their specific goals and play style while also balancing the
wants of the author, thereby guiding the player towards
an optimal gameplay path [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ]. An ExpM does this by
observing details about the player, such as the actions that
they take within the game, and extrapolating from them to
make decisions about which content to serve in the future
and by what means.
      </p>
      <p>
        ExpMs often make use of player models, which are
persistent models used to represent the player’s internal state that
the experience manager can update. These models are built
over time and can be used to make more complex and
longterm decisions by the ExpM, allowing for a better balance
between personalizing the game to the player and ensuring
the author’s intent is carried out [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        Since the player model is only an approximation of the
player’s internal thoughts and preferences it is not always
completely accurate as it is dificult to predict how humans
will behave. This can be problematic for ExpMs. When the
ExpM takes actions it necessarily biases the world towards
being better suited to its model of the player. At the same
time, this biasing of the environment can influence the sorts
of observations that the ExpM is likely to see. Take, for
example, a player who has previously shown to take combat
options in a game, the ExpM observes this and changes the
environment to better suit this sort of play style, removing
other possible types of actions and adding in more
combatfocused ones. If the player suddenly shifts their preferences,
which players have been shown to do [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], to prefer a more
diplomatic approach, then the environment may not ofer
suitable afordances for the player’s current preferences.
While there may be some diplomacy-oriented actions
available, they may be dificult to find and the majority will
be focused on combat. With their choices largely limited
the player may even continue to do combat actions leading
the ExpM to observe the player engaging with combat and
incorrectly strengthening its now outdated player model,
continuing to show that the player prefers combat. For the
player model to be properly updated, the player themselves
would need to seek out and find appropriate content, which
may be dificult and cause the player to disengage from the
experience.
      </p>
      <p>
        Because of this, many experience managers assume that
player preferences remain static during gameplay. Our
previous work shows that a player preference shift can be
detected [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and that by framing the problem as a Multi-Armed
Bandit (MAB) it is possible to find the player’s preference
and quickly recover the player model, though this has only
been shown to work with artificial agents [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. These works
argue that since the game environment is biased by the
removal of possible methods of interaction, then one can
attempt to learn a player’s updated preferences by adding
actions back into the environment. This is done by
introducing a new form of game object called a distraction, an
object that is used by the ExpM to gain information about
the player while also minimally disrupting the game.
      </p>
      <p>These distractions need to be deliberately designed to
entice players that are not engaged while also being ignored
by players that are engaged. This had been accomplished
naturally by the limitations of adapting the problem as an
MAB as MABs play rounds sequentially but only allow for
a single arm pull in each round. This limits the feedback to
only the single distraction added in that round but is only a
limitation of the adaption, not of distractions themselves.</p>
      <p>
        In this paper, we improve on this method by extending the
MAB framework to a Combinatorial MAB (CMAB), which
allows us to gather more information from the player and
recover the player model more quickly. We make use of
the CUCB algorithm [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] which allows us to use more than
one distraction at a time though can potentially be more
disruptive to the player due to using more distractions. We
additionally create an improvement that lessens the amount
of distractions needed by reusing part of the environment
when it is available. Both these methods are shown to
outperform previous methods in automated tests with artificial
agents. We also conduct a human study to validate that
humans perform similarly to how these artificial agents do.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Works</title>
      <p>
        Experience management is the generalization of drama
management [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] to not only encompass entertainment and
managing narrative drama, but also more serious contexts such
as education and training, and managing a player’s overall
experience in the game. It does this by observing the player
and manipulating the virtual world with a set of experience
manager actions that allow it to modify the game and its
environment to coerce the player’s experience according
to some goal. Early work on drama management focused
on balancing the intent of the author with allowing for a
breadth of actions to the player, without necessarily forming
an explicit persistent player model and instead modeling it
as an optimization problem [
        <xref ref-type="bibr" rid="ref10 ref11 ref8 ref9">9, 10, 11, 8</xref>
        ]. These are limited
in how they can represent and understand the player but do
serve to allow the player a wider breadth of actions, even
repairing the narrative after the player takes an unexpected
action as such killing an important NPC.
      </p>
      <p>
        Our focus is on ExpMs that make use of a player model,
as having a persistent model of the player can allow for the
ExpM to make longer term decisions and allow for more
intelligent actions in single [
        <xref ref-type="bibr" rid="ref1 ref12 ref3">1, 12, 3</xref>
        ] and multiplayer games
[
        <xref ref-type="bibr" rid="ref13 ref14">13, 14</xref>
        ]. These works assume that the player is static in
their preferences and that further observation will lead to
a more accurate model of the player. This assumption has
been challenged more recently [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and the need for dynamic
updates to ExpM player models has been acknowledged [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ].
These approaches focus on categorizing players according
to some pre-existing set of play styles.
      </p>
      <p>
        In our previous work we have shown that player
preference shifts can be detected [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and recovered [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. These
have been shown to work in automated agents that mimic
human-like behavior and utilize MAB algorithms to
accomplish the player model recovery process. MABs have been
used in player modeling [
        <xref ref-type="bibr" rid="ref13 ref16">16, 13</xref>
        ] and CMABs in experience
management [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] before but these approaches still assume
that the player is static in their preferences. We extend on
this work to show that humans instructed to emulate a
preference shift do act similarly to these automated agents and
additionally ofer an improvement on player model
recovery by expanding the ExpM’s actions and allowing it to pull
combinatorial super arms.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Methods</title>
      <p>In this section, we will start with a brief overview of CMABs
and the CUCB algorithm before detailing how we adapt our
environment to make use of this framework.</p>
      <sec id="sec-3-1">
        <title>3.1. Combinatorial Multi-Armed Bandits</title>
        <p>
          Multi-armed bandits (MABs) are single state Markov
Decision Processes where agents choose to take one of several
actions (called arms) [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]. Each action has a reward
distribution and the task of solving comes in the form of finding
an optimal policy that maximizes the total reward while
minimizing losses. This policy needs to be able to balance
between exploring each of the arms while gaining
information on their underlying reward distribution and exploiting
this information to maximize the reward from all the arms.
        </p>
        <p>
          We make use of Combinatorial Multi-Armed Bandits
(CMABs) an extension on MABs [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ]. Similarly to MABs, a
CMAB contains a set of  base arms that are played over
some number of rounds. Each arm is associated with a
random variable , for 1 ≤  ≤  and  ≥ 1 which denotes
the outcome for arm  at round . The set of random variables
{,| ≥ 1} associated with base arm  are iid with some
unknown expected mean value  .  = { 1,  2, ...,  } is
the expected mean of all arms. Instead of playing a single
arm, as one would do with a MAB, a super arm  is played
from the set of all super arms . We consider  to be the
subsets of the set of all arms  of size , for  ∈ {2, 3}.
At the end of the round, the reward () is revealed for
the super arm and is given to the contributing arm  ∈ .
We observe the reward per arm played and only reward a
single arm, a type of feedback belonging to semi-bandits
[
          <xref ref-type="bibr" rid="ref20">20</xref>
          ]. () is the number of times arm  has been played up
to round .
        </p>
        <p>
          To learn an optimal policy, we make use of the CUCB
algorithm [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ]. This algorithm first takes an initial  rounds
of playing a super arm that contains one of each of the
arms. For our implementation of the algorithm we choose
to always play a super arm that contains the lowest played
arms thus far, ensuring that each arm is played  times
in this time. Afterwards we calculate the adjusted means
with ¯  = ˆ + √︁ 23ln() , and select the super arm with the
highest valued adjusted mean.
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. CMAB Adaptation</title>
        <p>
          We extend on the work we introduced in [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] and thus use
a similar formulation to their MAB adaptation. One of the
core innovations to adapting the player model recovery
system to a MAB are distractions. Distractions are a type of
game object that are used to gain more information from
the player. These are intended to entice players that are
currently not engaged with the game from their current
task, but ideally would not be noticed or interacted with by
players that are engaged. This way they can be used as a
safe means to test whether a player is currently engaged,
and more importantly whether they are taking actions that
align well with the ExpM’s player model. Since distractions
are meant to balance this thin line of engagement they have
a few extra requirements. Distractions should:
1. be largely irrelevant to important parts of the game
like quests as to not tread on any authorial goals
2. represent a type of action or style of play in the game
(the distractions action type)
3. be easily recognizable by the player as belonging to
that action type
This way distractions can serve as a means to measure the
player’s preference by being clear and understandable while
also not being too intrusive to normal gameplay.
        </p>
        <p>These distractions are utilized with a set of ExpM actions,
which we call distraction actions, which put into play a new
distraction for the player to interact with. In terms of MABs,
these distraction actions form the arm pulls, where adding a
distraction with a specific type of action to the environment
is a pull of an arm for that type of action. For example: if a
valid action type is crafting, a distraction action for crafting
would be spawning a low level crafting ingredient like a
torn sack for cloth as a distraction. A player that is well
engaged with the game would ignore this and continue on
with their current task, but a player that is not engaged may
investigate further.</p>
        <p>
          In adapting the player model recovery system to a MAB
framework, previously we were were restricted to pulling a
single arm at a time[
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], which aligned well with the goal of
reducing the number of distractions used. Since the amount
of distractions introduced per round is flexible, we found
that further adapting to CMAB framework is natural and
allows us to gain more information per round since super
arms can be played. Pulling a super arm is a distraction
action that adds more than one distraction per turn and each
individual arm is an individual distraction. The super arm
formed from the touch and read arms would add 2
distractions, one of touch and the other of read action types. We
also follow our previous method of giving a reward of 1 to
the arm for the distraction that the agent has interacted with
and a reward of 0 for all other arms. This means that super
arms are not rewarded together, only a single arm within
the super arm gets a reward. If the agent has not interacted
with any distractions in that turn then no reward is given to
any arms that round. This allows for more specific rewards
to the action types that are interacted with compared to
rewarding the super arm as a whole.
        </p>
        <p>We have also previously found mixed success in purely
reducing the bias of the environment by only adding a single
distraction per round, selecting the distraction from those
that have action types that are not in the current area.
Taking that as inspiration we have developed an additional
improvement: replace-with-environment-action. With this
improvement active, a super arm that contains a distraction
with an action type that is already present within the current
area will have the distraction replaced with the matching
environment action. Instead of adding a distraction of that
overlapping type, the algorithm considers any action taken
of that overlapping type in that round to count towards
that distraction. This has the efect of adding some of the
player’s interaction with the environment to the CMAB
reward model. This has the efect of reducing the amount of
distractions added early on to around 1.7 per round in the
ifrst 10 turns, though as the algorithm gains more
information it starts to rarely need to give an environment action
and thus this decreases to an average of 1.95 after the first
10 turns.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments and Results</title>
      <p>In this section, we will go over both our tests with automated
agents and human subjects as well as their results. While
these two experiments share many features, humans require
much more than artificial agents do and as such we will note
where there are diferences.</p>
      <sec id="sec-4-1">
        <title>4.1. Automated Experiments</title>
        <p>The goal of these experiments is to evaluate our CUCB
method in a controlled situation. Thus, we follow our
previous experimental outline in which we use autonomous
agents rather than humans. In these experiments, we
compare our CUCB methods against their best performing
method  -greedy.</p>
        <sec id="sec-4-1-1">
          <title>4.1.1. Environment</title>
          <p>Our automated agents use a smaller environment with
simpler action types as the artificial agents are free to perform
the same actions and explore the same rooms without
getting bored or disengaging with the experience, as a human
in a similar situation would. This environment is written in
Inform7, a language and integrated engine used to create
interactive fiction that is played with natural language
syntax. Inform7 is well suited to be used with artificial agents
and is easily understandable to humans.</p>
          <p>In this environment, we use seven areas, called rooms,
which are all traversable using cardinal directions. Within
these rooms are several objects that the player agents can
interact with, with each object having a corresponding
action type. We make use of five diferent action types: look,
talk, touch, read, and eat. There is a sixth type of action,
move, but since it is so prevalent and necessary for
gameplay we do not consider this something that a player can
prefer. Of these, we consider talk to be the Primary action
type since it is the one most prevalent in the environment
and is the focus of the quest that the player agent is tasked
with. Other types of actions are also present within the
environment but are not the focus, namely look and touch.
It is often not possible to create an environment or quest in
a game that only uses a single type of action so we include
these to mimic that. These, along with the Primary action
type, are considered Environmental action types since they
are present in the environment. The last two, read and eat,
are missing from the environment and thus are considered
Missing action types. Our inclusion of these two types of
actions that are completely missing from the environment
is to simulate the efects of the ExpM previously biasing
environment due to the player not preferring read or eat
actions.</p>
          <p>
            These five action types are also the basis for our player
model, with the player agents preferring to primarily
interact with one of these action types. We expect that players
will have a strong preference towards one of these, but
also that it is often not possible to complete quests without
engaging with others. Because of this, our player agent
preferences are set to primarily interact with one type of action
(11/15ths of the time), but have a low chance of interacting
with the other 4 (1/15th of the time each).
4.1.2. Agents
To test our method in our environment we make use of the
same 3 automated agents used in our previous work. These
agents are known as the: Exploration Focused Agent, Goal
Focused Agent, and Novelty Focused Agent. These agents work
in diferent ways and are intended to attempt to resemble
different aspects of how humans would play the game. Two of
these, the Exploration and Goal focused agents, are inspired
by the Bartle taxonomy of player types [
            <xref ref-type="bibr" rid="ref21">21</xref>
            ], representing
the explorers and achievers quadrants respectively. Since
there is no social aspect to our game we do not consider
the other two quadrants, instead for our last agent we take
inspiration from literature on user engagement that states
that more novel objects are likely to increase engagement
[
            <xref ref-type="bibr" rid="ref22">22</xref>
            ].
          </p>
          <p>Each of these three player agents have an internal
preference distribution that they use to decide which game objects
to interact with, but each agent uses it slightly diferently.
The Exploration Focused Agent has a 90% chance to interact
with an object that it sees in the room, with a 10% chance to
wander to a diferent room. For all the objects that are
available it randomly chooses one, drawing the probability of
interaction from its preference distribution to choose which
one, and if there are no objects it moves to a diferent room.
The Goal Focused Agent on the other hand first chooses a
type of action to interact with according to its preference
distribution. If it fails to find a suitable object to interact
with that is compatible with that type of action it will instead
take a step towards completing the quest goal. Most actions
needed to complete the quest goal are either moving to a
diferent room or talking, the primary action type. Lastly,
the Novelty Focused Agent puts equal importance on both
the novelty of an object and its own preference distribution,
and likewise first chooses a type of action to interact with
before finding the object in the environment matching that
action type. If there are multiple objects of that type it will
prefer the one it has interacted with the least, and if there
are no objects of that type it falls back on taking a step
towards completing the quest goal.</p>
        </sec>
        <sec id="sec-4-1-2">
          <title>4.1.3. Managers</title>
          <p>Alongside our three agents, we tested each against five
different managers. As these do not implement full experience
managers we use the term manager to distinguish them. We
compare against a lower baseline which takes advantage of
how our player agents work by always providing 5 diferent
types of distractions, which we call the One-of-each
manager. We also compare against our previous best,  -greedy.
The last three of these are dedicated to testing variants on
CUCB, one where a super arm consists of 2 distractions
( = 2), another with a super arm of 3 distractions ( = 3),
and the last where a super arm only has 2 or fewer
distractions using our replace-with-environment-action strategy
( = 2, rwea). All of these managers calculate the player
model the same way, by measuring the frequency of types
of actions that the player takes starting when the managers
thinks that the player has shifted their preference.</p>
        </sec>
        <sec id="sec-4-1-3">
          <title>4.1.4. Preference Scenarios</title>
          <p>We also tested a number of preference switch scenarios
for a full understanding of how our methods perform. We
include 20 diferent scenarios, switching from primarily
preferring one action type to a diferent action type. These
are grouped together into 4 groups that represent whether
the preferred action type is considered to be Environmental
or Missing: Environment to Environment, Missing to
Environment, Environment to Missing, and Missing to Missing.
We previously identified that the most relevant scenario
group for analysis is Environment to Missing so we will be
focusing on it, but for completeness we include the results
for all scenario groups in the appendix.
4.1.5. Results
For each combination of the 5 managers, 3 agents, and the
preference switch scenarios we run 100 trials. Each trial
consists of 100 turns of history that is shared between all
the agents but difer for each preference switch scenario.
Since this history is shared between the agents it is run with
the Goal focused agent. This history consists of 90 turns in
which the agent follows its initial preference, followed by
10 turns of its switched preference in which the managers
is recording but not actively giving distractions. These 10
turns are used to simulate the time it would take to detect
that a player preference shift has happened. At turn 100 the
state of the quest is reset to allow agents to continue
exhibiting their quest completing behavior and the managers starts
taking distraction actions and it and the agent continue
until turn 199. We compare the agent’s internal preference
to the managers calculated player model (which it starts to
measure from 90 onwards) with the Jensen-Shannon (JS)
distance. A lower value corresponds to a closer match between
the two models and indicates a better result.</p>
          <p>The results of our tests can be seen in Figure 2 focusing on
what has been identified as the most realistic scenario, when
the player agent switches from preferring Environmental
action types to Missing ones.</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Human Study</title>
        <p>We found that our method performs better than previous
methods with automated agents but has not been tested
against human subjects. In this section, we will go over
the modifications that we made to the environment and the
distraction to make this compatible with human players. We
will also detail the specifics of the human task. This study
was reviewed by our University’s Institutional Review Board
(IRB).</p>
        <sec id="sec-4-2-1">
          <title>4.2.1. Environment</title>
          <p>To better accommodate the complexities of human behavior
we have heavily modified the Inform7 environment for
human use. We expect that, unlike agents, humans will grow
bored and disengaged when attempting to traverse the same
7 rooms and the single quest for 75 turns. Thus, we have
expanded the number of rooms available to the player from 7
to 36 rooms in a 6 by 6 grid. These rooms are all traversable
with only cardinal directions for ease of navigation, but
form a more complex map which can be seen in Figure 1.
Along with the expanded amount of rooms, we have added
2 tasks for players to engage in, each more complex than
the task in our automated tests, to better simulate a normal
interactive fiction game.</p>
          <p>We have also used diferent action types to better
represent the types of actions that humans would normally
engage with in games: Diplomacy, Crafting, Combat, Stealth,
and Magic. Since we have fewer human participants than
we can with automated agents we have also modified the
distributions of the action types in the environment. For
these tests we consider Diplomacy the Primary action type,
Combat to be an additional Environmental action type, and
the rest to be missing since they are not normally available
in the environment.</p>
        </sec>
        <sec id="sec-4-2-2">
          <title>4.2.2. Distractions</title>
          <p>This environment also includes distractions, which are
normally not present in the world but can be added to the room
the player is currently in with a manager action. Since we
suspected that players would be either need to be alerted to
objects added to a room or otherwise they may not notice
that they are added, we have opted to only add distractions
when the player moves to a diferent room as this will align
with when the room description is printed out, listing the
items. These distractions are removed from the room when
the player leaves the room.</p>
          <p>This change does afect how our manager handles
observing to player actions. Now the observations happen on
room change and all distractions that the player has
interacted with before moving onto a new room are counted as
being interacted with. Distractions resolve themselves in a
single turn and are subsequently removed as to only allow
for the player to interact with each once. A sample of the
distractions for each type can be seen in Table 1.</p>
          <p>Distractions also were originally designed to end with
a failure outcome, a monster turns out to be a bunch of
branches shaped like a monster, or a deer runs away before
it is hit. This is due to our focus on the distraction being
irrelevant to the larger narrative. We found that
preliminary participants found these discouraging and recognized
that such objects would end in failure, which led to many
to simply not interact with any distractions. This led us
to change most distractions to resolve themselves with a
small success instead, giving the player a small reward such
as XP, rations, or crafting materials, though these values
were never actually tracked. These changes contributed to
participants continuing to interact with distractions though
may have made them too enticing. For future work, we
consider that these sorts of rewards need to be finely tuned
and contextualized so as to not make the act of interacting
with distractions too enticing to the player.
4.2.3. Task
Our goal in this task is to show that the same trends that are
visible in the automated agents reflect the sorts of behaviors
found in human players. To do this we set up the task to
mimic the Environment to Missing scenario group, focusing
only on a single type of preference shift for simplicity. To
this end we recruited 30 anonymous human participants
on Prolific to take part in this study, only restricting the
location to English speakers in the United States. Of these,
we had to remove a single user. This user was able to finish
the task but the majority of actions they tried to take caused
errors as they did not use the syntax necessary for Inform7
games, leaving only 22 valid actions, all but one of which
was just moving around the map.</p>
          <p>
            We task the human player to start with a preference for
Diplomacy type actions and after playing for 20 turns to
switch their preference to Crafting. We simulate that it
will take some time for the manager to detect that a player
is distracted so have added a 5 turn ofset from when the
participant is asked to switch their preference to when the
manager can start taking actions. This 5 turn ofset is
optimistic and previous results have indicated that it should
take longer [
            <xref ref-type="bibr" rid="ref5">5</xref>
            ]. Nonetheless, this is left at 5 turns so as to
not frustrate the participant and to minimize the amount of
time they need to spend on the task.
          </p>
          <p>Afterward, the manager continues to observe the
participant’s actions and add new distractions for 50 more turns.
Due to the way that the CUCB algorithm works the first 5
of these turns contain a super arm with one of the five arms,
which we have chosen to always play the two distractions
that have been least played so far, which results in each type
of distraction appearing twice. After those 50 turns are over,
on turn 75, we automatically end the play session.</p>
        </sec>
        <sec id="sec-4-2-3">
          <title>4.2.4. Processing</title>
          <p>In serving the Inform7 environment to humans we were
presented with a number of limitations, both from the game
due to using a web interpreter (Quixe, bundled with
Inform7) and due to the changes that were needed to make
this playable by humans. Unlike the automated tests, we
no longer had access to the internal state information that
reports what kind of action was made and had to classify
it manually. This was done based on keyword matching,
generally based on the verb used, and iteratively continuing
to add keywords until all user commands were classified.
We classified these into several categories, one for each of
the five action types and additionally move for movement
actions, and none for all the rest. The five action type
categories mostly used the verbs related to the proper way
of interacting with the distraction, but we also added
extra keywords when the intent was clear (e.g. the invalid
verbs "repair" and "craft" which are clearly an attempt to
craft). The largest of these categories is none and accounts
for all invalid verbs due to spelling mistakes, commands
without verbs, and valid verbs like "look" and "examine"
which simply do not correspond to an action type. For our
player model calculation, we only used the 5 action type
categories.
4.2.5. Results
The results of our human experiments can be seen in Figure
3. Since we do not expect to know what the final
preference is in normal gameplay, we have opted to show the JS
distance of the measured player model compared to five
diferent preference distributions. These five distributions
correspond to preferring one of the five diferent action
types and are distributed similarly to our agent’s internal
preferences, 11/15 for the preferred action type and 1/15
for the others. The results also depend on what turn we
start measuring the player model. Ideally detecting a
preference shift would be able to find which turn a player shifted
their preference, but to cover all cases we show what the JS
distance looks when starting from the beginning (20 turns
before a shift), when the preference shift occurs, and when
the manager starts to give distractions (5 turns after the
shift).</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion</title>
      <p>We found that CUCB outperforms the previous best of 
greedy with the ability to get surprisingly close to
OneOf-Each. Additionally we found that human results show
similarities to the artificial agents, but that creating
distractions that are well suited for human players requires careful
balancing. We will discuss further implications of these
experiments below.</p>
      <sec id="sec-5-1">
        <title>5.1. Automated Agents</title>
        <p>The best performing manager is the One-Of-Each manager
which serves as our lower baseline, specifically taking
advantage of our agent’s behavior. This manager provides one
of each of the five distractions on every turn, which means
that all possible action types are always represented. Since
our agents do not take into account how many distractions
there are like humans would, this serves as an
approximation to the lower limit to how quickly a player model can
possibly be recovered. For this reason, we do not consider
this to be a valid strategy to test on humans since it would
quickly overwhelm them with options, and would require a
large amount of varying distractions as to not be repetitive.</p>
        <p>For our CUCB based managers we find that giving only
two distractions at a time already provides significant
benefits over our previous best,  -greedy ( = 0.2). This is
expected as giving more distractions allows us to gain more
information, taking advantage of not just if the player agent
interacts with a distraction, but also which of the
distractions it chooses to interact with. Previously an efect was
found where the distance between the agent’s internal
preference distribution and the measured preference
distribution hits a minimum value and then starts to increase. This
is attributed to the MAB gaining enough information on the
preferences that it started to give the highest valued action
type almost exclusively, though was often only seen in less
realistic scenarios like when the agent shifts to an action
type that is already well represented by the environment.</p>
        <p>In our tests we see that this efect is also present even
in the Environment to Missing scenario in the Exploration
focused agent for everything but the  -greedy manager. This
suggests that for this agent the strategies that exhibit this
efect are capable of identifying the new preference within
20 turns of the manager activating, and is likewise because
one of the distractions given is almost always the preferred
distraction. Other agents do not exhibit this efect, which
we take to mean that they are significantly more dificult
to recover the player’s preferences for. For both the Goal
and Novelty focused agents this is likely because they will
default to environment actions when they are not given an
action that they wish to interact with, thus skewing the data
in favor of environment action types.</p>
        <p>For CUCB both  = 3 and  = 2 when replacing a
distraction with an environment action are capable of getting
surprisingly close to One-Of-Each in all agents. Replacing a
distraction with an environment action was developed to
reduce the number of distractions that were shown each
turn, which reduced the number of distractions from 2 to an
average of 1.93, though this value is around 1.3 for the first
couple turns. Later on the manager has enough information
about actions in the environment that it does not need to
play them as often so replacement only occurs rarely. We
expected that this would result in a hit to this strategy’s
ability to recover the new preferences but found that instead
it increased its ability. Without replacement when the agent
interacts with the environment no reward is given to any
of the actions in the environment. In replacing a distraction
with an environment action we allow any object that the
agent interacts with to count as a reward for the CMAB
algorithm, which allows it to watch for more actions than before
and naturally fill in action types that are underrepresented
due to the biasing of the environment.</p>
        <p>Distraction type
Combat</p>
        <p>Crafting
Diplomacy</p>
        <p>Magic
Stealth</p>
        <p>Attack Strange Goblin</p>
        <p>Attack Deer
Take Meteoric Iron</p>
        <p>Take cotton cloth
Help Hungry Beggar
Help Trapped Frog
Pray at idol statue
Touch mystical vine</p>
        <p>Steal coin purse
Pickpocket snoozing man</p>
        <p>Resolution
As you approach the figure it turns out to be several branches vaguely
in the shape of a goblin.</p>
        <p>You quickly take down the deer. +10xp, +3 rations.</p>
        <p>You have acquired +1 iron.</p>
        <p>You have acquired +1 cloth.</p>
        <p>You give some food to the beggar, who eats it gratefully and blesses
you for your kindness. +1xp.</p>
        <p>You carefully free the frog from the crevice, and it hops away with a
thankful croak. +2xp.</p>
        <p>Your heart is warmed by the god’s blessing and you are filled with
peace and courage. +2mp.</p>
        <p>The vine’s light pulses stronger, and you feel a rush of vitality. +1hp.
You decide to take the whole thing +10gp, though you discard the coin
purse itself after you extract the contents.</p>
        <p>The man reeks of alcohol and is absolutely conked out, but unfortunately
you do not find anything of use on him.
In our results, we find that we see a pattern that is similar
to our agents. Our agents (Figure 2) measure the JS distance
against their own internal preference distribution, and the
analogous curve in our human data (Figure 3) would
correspond to Vs. Crafting, as that is what we instructed the
participants to prefer after the switch. We expect that as
soon as distractions are available (turn 25) we should see
a sharp drop in the distance between the measured player
model from a primarily Crafting model, while the distance
to other models steadily rises starting from when the
preference shift occurred (turn 20). This trend exists when
measuring from the beginning, but we additionally see a
small immediate drop. This is due to some players
attempting to take crafting actions immediately even though the
environment does not allow for it. We also expect the trend
of the measured distance to eventually flatten out, which is
observed in all three plots.</p>
        <p>The behavior of users attempting to do actions that are
not possible is due to how we have to manually classify
what type of action a human commands. Instead of directly
looking at the type of action as reported by Inform like
we can for our automated tests, we instead classify actions
based on the keyword that was used in the command. This
allows us to classify actions that are not actually possible
in the game, so it still counts as a crafting action when a
participant attempts to "craft" something. We found in
preliminary tests that many participants struggled to navigate
the interface early in the experiment though we still wanted
to capture the intent of the commands entered.</p>
        <p>This confusion on how to navigate did not afect all users,
but in response, we clarified the instructions and guided
users to use the "help" command if they needed it, but we
also found that the issue may have been more fundamental
to the distractions. An early version of the distractions had
a larger variety of interactions with each type of
distraction, and we had not yet considered the importance of the
distraction’s action type being easily recognizable by the
player. Take the Combat distractions as an example. Early
formulations of these had either equipment you could pick
up or monsters you could attack. These could be
confusing to players as picking up items to disassemble them is
recognizably a Crafting action, the only distinguishable
difference between the two would be the outcome after the
player has already interacted with it. The current version
of Combat distractions simplifies this, always presenting
animals to be hunted or threatening monsters, and always
interacting by attacking. An ideal implementation of the
system would allow for distractions to be independent of the
action, allowing for a generic distraction with multiple types
of interaction. In the future we plan on investigating how
these sorts of combo-distractions can be used, especially
if they can themselves be formed as a super arm, thereby
allowing for using fewer distractions at a time while still
gaining the same amount of information on the player’s
preferences.</p>
        <p>We find that the distance measurement is sensitive to
when we start measuring. When we start measuring at the
beginning, 20 turns before the preference shift, we find that
the extra Diplomacy moves make it more dificult to find
what the preference is immediately after. Starting at later
points makes the trend is easier to see, but may be throwing
out relevant data. Only starting from when the manager
detects a preference shift may not perform as well as having
a good estimate for when the preference shift occurred.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>Experience managed games help guide players through a
more interesting and tailored experience, but this process
of customizing the experience does not take into account
the unpredictable nature of people. Player experience may
degrade if this unpredictability is not taken into account and
accommodated. Previously it has been shown that
recovering a player model after a preference shift is possible, but
only in automated agents. In this paper we further improve
on this process by modeling it as a combinatorial
multiarmed bandit and making use of the existing environment
to supplement distractions. In addition we demonstrate that
humans behave similarly to how the artificial agents behave.
In the future we plan on expanding this system to combine
detection of preference shifts and player model recovery
after a shift has been detected.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Sharma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ontanón</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. R.</given-names>
            <surname>Strong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mehta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ram</surname>
          </string-name>
          ,
          <article-title>Towards player preference modeling for drama management in interactive stories</article-title>
          .,
          <source>in: FLAIRS</source>
          ,
          <year>2007</year>
          , pp.
          <fpage>571</fpage>
          -
          <lpage>576</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>D. J.</given-names>
            <surname>Thue</surname>
          </string-name>
          , Generalized experience management (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>H.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Riedl</surname>
          </string-name>
          ,
          <article-title>Data-driven personalized drama management</article-title>
          ,
          <source>in: Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment</source>
          , volume
          <volume>9</volume>
          ,
          <year>2013</year>
          , pp.
          <fpage>191</fpage>
          -
          <lpage>197</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Valls-Vargas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ontanón</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <article-title>Exploring player trace segmentation for dynamic play style prediction</article-title>
          ,
          <source>in: Proceedings of the AAAI conference on artificial intelligence and interactive digital entertainment</source>
          , volume
          <volume>11</volume>
          ,
          <year>2015</year>
          , pp.
          <fpage>93</fpage>
          -
          <lpage>99</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vinogradov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Harrison</surname>
          </string-name>
          ,
          <article-title>Detecting player preference shifts in an experience managed environment</article-title>
          ,
          <source>in: International Conference on Interactive Digital Storytelling</source>
          , Springer,
          <year>2023</year>
          , pp.
          <fpage>517</fpage>
          -
          <lpage>531</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vinogradov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Harrison</surname>
          </string-name>
          ,
          <article-title>Using multi-armed bandits to dynamically update player models in an experience managed environment</article-title>
          ,
          <source>in: Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment</source>
          , volume
          <volume>18</volume>
          ,
          <year>2022</year>
          , pp.
          <fpage>207</fpage>
          -
          <lpage>214</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>W.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <article-title>Combinatorial multi-armed bandit with general reward functions</article-title>
          ,
          <source>Advances in Neural Information Processing Systems</source>
          <volume>29</volume>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M. O.</given-names>
            <surname>Riedl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Stern</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Alderman</surname>
          </string-name>
          ,
          <article-title>Dynamic experience management in virtual worlds for entertainment, education, and training</article-title>
          ,
          <source>International Transactions on Systems Science and Applications</source>
          ,
          <source>Special Issue on Agent Based Systems for Human Learning</source>
          <volume>4</volume>
          (
          <year>2008</year>
          )
          <fpage>23</fpage>
          -
          <lpage>42</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M.</given-names>
            <surname>Mateas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Stern</surname>
          </string-name>
          ,
          <article-title>Integrating plot, character and natural language processing in the interactive drama façade</article-title>
          ,
          <source>in: Proceedings of the 1st International Conference on Technologies for Interactive Digital Storytelling and Entertainment (TIDSE-03)</source>
          , volume
          <volume>2</volume>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Lamstein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mateas</surname>
          </string-name>
          ,
          <article-title>Search-based drama management</article-title>
          ,
          <source>in: Proceedings of the AAAI-04 Workshop on Challenges in Game AI</source>
          ,
          <year>2004</year>
          , pp.
          <fpage>103</fpage>
          -
          <lpage>107</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>M. J. Nelson</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Mateas</surname>
          </string-name>
          ,
          <article-title>Another look at search-based drama management</article-title>
          .,
          <source>in: AAMAS (3)</source>
          ,
          <year>2008</year>
          , pp.
          <fpage>1293</fpage>
          -
          <lpage>1298</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>M.</given-names>
            <surname>Sharma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ontañón</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mehta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ram</surname>
          </string-name>
          ,
          <article-title>Drama management and player modeling for interactive fiction games</article-title>
          ,
          <source>Computational Intelligence</source>
          <volume>26</volume>
          (
          <year>2010</year>
          )
          <fpage>183</fpage>
          -
          <lpage>211</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>R. C.</given-names>
            <surname>Gray</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ontañón</surname>
          </string-name>
          ,
          <article-title>Multiplayer modeling via multi-armed bandits</article-title>
          ,
          <source>in: 2021 IEEE Conference on Games (CoG)</source>
          , IEEE,
          <year>2021</year>
          , pp.
          <fpage>01</fpage>
          -
          <lpage>08</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ontañón</surname>
          </string-name>
          ,
          <article-title>Experience management in multiplayer games</article-title>
          ,
          <source>in: 2019 IEEE Conference on Games (CoG)</source>
          , IEEE,
          <year>2019</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>R.</given-names>
            <surname>Khoshkangini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ontanón</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Marconi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <article-title>Dynamically extracting play style in educational games, EUROSIS proceedings</article-title>
          ,
          <source>GameOn</source>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>R. C.</given-names>
            <surname>Gray</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Arigo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Forman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ontañón</surname>
          </string-name>
          ,
          <article-title>Player modeling via multi-armed bandits</article-title>
          ,
          <source>in: Proceedings of the 15th international conference on the foundations of digital games</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>K. Y.</given-names>
            <surname>Kristen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Guzdial</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. R.</given-names>
            <surname>Sturtevant</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cselinacz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Corfe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. H.</given-names>
            <surname>Lyall</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Smith</surname>
          </string-name>
          ,
          <article-title>Adventures of ai directors early in the development of nightingale</article-title>
          ,
          <source>in: Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment</source>
          , volume
          <volume>18</volume>
          ,
          <year>2022</year>
          , pp.
          <fpage>70</fpage>
          -
          <lpage>77</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>A.</given-names>
            <surname>Slivkins</surname>
          </string-name>
          , et al.,
          <article-title>Introduction to multi-armed bandits</article-title>
          ,
          <source>Foundations and Trends® in Machine Learning</source>
          <volume>12</volume>
          (
          <year>2019</year>
          )
          <fpage>1</fpage>
          -
          <lpage>286</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>W.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yuan</surname>
          </string-name>
          , Combinatorial multiarmed bandit:
          <article-title>General framework and applications</article-title>
          , in: International conference on machine learning,
          <source>PMLR</source>
          ,
          <year>2013</year>
          , pp.
          <fpage>151</fpage>
          -
          <lpage>159</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>J.-Y. Audibert</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Bubeck</surname>
          </string-name>
          , G. Lugosi,
          <article-title>Minimax policies for combinatorial prediction games</article-title>
          ,
          <source>in: Proceedings of the 24th Annual Conference on Learning Theory, JMLR Workshop and Conference Proceedings</source>
          ,
          <year>2011</year>
          , pp.
          <fpage>107</fpage>
          -
          <lpage>132</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>R.</given-names>
            <surname>Bartle</surname>
          </string-name>
          , Hearts, clubs, diamonds, spades: Players who suit muds,
          <source>Journal of MUD research 1</source>
          (
          <year>1996</year>
          )
          <fpage>19</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>H. L. O'Brien</surname>
            ,
            <given-names>E. G. Toms,</given-names>
          </string-name>
          <article-title>The development and evaluation of a survey to measure user engagement</article-title>
          ,
          <source>Journal of the American Society for Information Science and Technology</source>
          <volume>61</volume>
          (
          <year>2010</year>
          )
          <fpage>50</fpage>
          -
          <lpage>69</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>