<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Generalizing a Numeric Personality Metric for Narrative Planners</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Elinor Rubin-McGregor</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Brent Harrison</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science University of Kentucky</institution>
          ,
          <addr-line>Davis Marksbury Building, 329 Rose Street, Lexington, KY 40506-0633</addr-line>
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In the field of narrative planning, there are many diferent approaches to personality modeling. So many that overarching study of personality models themselves is beginning to form. But a subject as complex as personality demands complex modeling, which in turn makes it dificult to compare implementations or to test sub-features of personality systems intended to be globalized. By generalizing an existing five-number model personality system, we hope to provide an adaptable resource that can be used for enhancement, comparison, or simply providing a foundational basis to other personality models.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>The consideration of personality is a major step forward
in the field of narrative generation. Narrative generators
have a multitude of applications, from training models to
video games and even organizational and strategic purposes.
Incorporating personality into narrative models is a subject
that has vexed many researchers for years, as personality is
such a complex and varied concept. Yet it is critical, for if
we do not model personality, our narratives cannot consider
ways in which behavior difers between diferent individuals.
For narrative purposes alone, stories become more engaging
if the audience can identify with the characters and see them
as reflections of real people. Without personality considered,
it is far more dificult to display narrative elements known
to entice audiences such as character depth. Two people in
the same situation will make diferent choices depending
on who they are, and attempting to capture that concept of
"who they are" has been the pursuit of many.</p>
      <p>
        Currently, there are a wide variety of diferent unique
personality models proposed for this purpose, with varying
advantages and disadvantages. Many of these models,
however, require a great deal of efort to implement because they
rely on information that is dificult for narrative planners to
collect. For systems where personality is the central focus,
or where personality is an important element this may be an
acceptable cost to pay. But what about when the program
is not focused on developing a specific personality system,
but instead on features related to multiple personality
adaptation systems? Or perhaps, when personality is required
or beneficial but not the primary focus [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ]? What about
simply having a baseline personality model to compare a
more complex model to [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]? Having a small-scale easily
implementable personality model would be beneficial for
other researchers in this field.
      </p>
      <p>There is an existing personality model that does not
require a great deal of efort to collect, running on data that
many narrative planners can easily collect already. This is
the OCEAN-based personality model produced by Shirvani
and Ware 2019. For easy of understanding and brevity, we
refer to this model as Shirvani19. While this model does
utilize data that is generally available to narrative planners,
it does have limitations associated with it. For example, this
model is not entirely open to all domains and has some
features that cannot be calculated by a computer during story
generation.</p>
      <p>Specifically, the Shirvani19 model uses metrics that
demand an understanding of not only the current story plan,
but all or a large number of hypothetical alternative story
plans. One such metric describes "creative thinking." This
metric is calculated by checking how many times the
specific actions of a given character occur in a larger, preferably
all-alternative-plan-encompassing, set of alternative stories.
In addition, the paper uses the concept of "conflict" in its
metrics for determining both agreeableness and intellect
but defines its measurement of conflict as any time a
character can observe any way in which their plans can fail.
This feature also requires knowledge that cannot easily be
generated during story creation, as evaluating it requires
essentially finishing the story in multiple ways before the
story is even concluded. In short, there are features of the
Shirvani19 model that can only be used to evaluate
personality after several stories have already been generated, which
in turn makes the model dificult to use if we want to apply
it during story generation.</p>
      <p>To this end, we are proposing to modify the Shirvani19
model such that it can be applied to a wider variety of
narrative planners. We are also trying to simplify the overhead
required to make the personality model work. Specifically,
we propose to calculate a metric that describes "creative
thinking" by comparing the diversity of actions only along
the specific plan, so that characters who utilize a broader
range of actions are considered to have a higher Openness
score than characters who repeatedly use the same actions.
Likewise conflict is redefined for both of its uses. Where it
is applied for measuring a character’s afability, we instead
check simply the number of ways a character’s actions could
directly harm other characters. Where conflict is applied to
intellect, we translate the chance of success to the chance
that other characters will oppose the actions of the given
character.</p>
      <p>In order to ensure that our proposed methods are usable,
we performed a user study wherein subjects evaluated the
stories produced by our modified model. In the end we
found that while our Agreeableness work seems to be very
applicable, our re-definition of Openness will need some
refinement in later work.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        There is a large amount of work on representing personality
in digital narratives, even work that focuses on the Big Five
OCEAN framework, but not many that are very modular
[
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ]. Shirvani came out with a follow-up to Shirvani19
that addressed the issues discussed here, but at the cost of
increasing the size of the model [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>
        The well-known Versu drama manager is very good at
telling complex stories with consistent character personality,
but it requires a great deal of overhead work to run [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. The
model needs files representing the world, social practices,
and the characters. Not only that but it needs parser
programs for all of these features, then initialization functions,
then a database to hold it all, and multiple levels of
instantiators before it can make a decision. Likewise, the Comme il
Faut project handles complex emotional environments very
well, but it requires a great deal of information provided to
the data manager for any story domain to work[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
Information on cultural knowledge, social facts, social states, social
exchanges and even more must be documented in order for
them to be applied.
      </p>
      <p>
        There are of course works that focus less on societal
impacts as a whole, and more on the individual characters.
Bahamon and Young introduced other OCEAN-based
systems, but they have not produced a way to directly evaluate
the OCEAN traits during runtime without extensive
preparation. Their earlier work in 2012 provides a way to remove
actions deemed out-of-character during story generation,
but does not provide a mechanism to determine whether
behavior is out-of-character or not. It is a model we would
like to use to test our own work on in the future [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Their
later work further discusses evaluating personality
consistency in narrative models, but still does not introduce a
personality model to use [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
      <p>
        The drama manager from Why Are We Like This works
well with the player’s actions and models character
personality from player actions, but because of this it only works
for the specific high degree of player interaction used in
the project [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. It also uses abstract personality modeling,
rather than a personality system that can work as soon as it
is applied. There has also been work by Soares that models
the personality of the player for narrative decisions, but it
does not model the characters of the narrative in the same
way [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Shirvani and Ware developed a very impressive
emotion-based personality model that solves many of the
same problems this paper seeks to correct [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. This model
relies upon its emotional system heavily in modeling
personality, which in turn requires a larger amount of overhead
and thus isn’t as modular as this paper seeks to be. Its
reliance on its emotional system also prevents it from being
used with various other emotion-focused models [
        <xref ref-type="bibr" rid="ref13 ref14">13, 14</xref>
        ].
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Background on the Shirvani19</title>
    </sec>
    <sec id="sec-4">
      <title>Model</title>
      <p>
        Shirvani and Ware proposed a personality model [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] for
characters in a computational narrative that was based on
the OCEAN model of personality, using Sabre as the basis
of its planning model [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. The OCEAN personality model,
or "the big Five model," utilizes five key attributes to
collectively describe personality: Openness, Conscientiousness,
Extraversion, Agreeableness, and Neuroticism [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. These
are commonly accepted attributes of personality, and are
defined as such:
• Openness means "openness to experience" and
describes how much a person is willing to explore
outside of their comfort zone. This feature is also
considered an aspect of curiosity, and therefore is
often tied to creativity as well
• Conscientiousness is how organized and efective a
person is. Someone who acts carelessly or struggles
to complete tasks is considered to have low
conscientiousness.
• Extraversion is the degree to which a person wants
to engage and interact with other people. Notably
a highly extraverted person can also be very
malicious, as this category does not diferentiate between
positive or negative engagement with others, only
frequency.
• Agreeableness is reflective of compassion and
empathy, and is used to measure how much a person
considers other people. Like Extraversion, someone
can be very shy and have high agreeableness.
• Neuroticism is a more internal emotional feature, as
it describes essentially how nervous and insecure a
person is. Highly neurotic people will often struggle
with self-esteem, and emotional instability is often
linked to high levels of neuroticism.
      </p>
      <p>The Shirvani19 model is primarily focused on scoring
the actions of a character according to how those actions
relate to these attributes. That is to say, it estimates what
personality traits are being displayed in a given character’s
actions, and to what degree each action displays those traits.</p>
      <p>They do this by calculating twelve variables that are each
used to contribute to a score describing a diferent OCEAN
attribute. A full table of these metrics and how they relate
to each OCEAN attribute are listed in Table 1. Of note, any
value with a (R) in it means that the value is used to reduce
the overall score, as it defines a facet that makes an action
ift less into the given personality attribute.</p>
      <p>Of these scores, two are not as easy to formulate as others.
Agreeableness and Openness utilize metrics that are dificult
to obtain during story generation. We will discuss these
metrics in greater detail below.</p>
      <sec id="sec-4-1">
        <title>3.1. Agreeableness</title>
        <p>As can be seen in Table 1, the Agreeableness OCEAN quality
contains 4 metrics associated with it. One of these metrics,
(11 in Table 1), requires the planner to be able to calculate
the number of conflicts created for other characters. The
Shirvani19 definition for conflict can be problematic for
eficient calculation. Shirvani19 defines character conflict
as occurring when a character can foresee any way their
plan can go wrong and fail to reach their goal. This element
is extremely dificult to evaluate in many systems, as it
requires simulating all possible alternative actions or events
that could happen, not simply the actions they intend to
have happen. This would take a large amount of operational
time and resources to run, as well as require limits or ways
to determine when to stop simulating additional possible
future plans.</p>
      </sec>
      <sec id="sec-4-2">
        <title>3.2. Openness</title>
        <p>The Openness attribute of OCEAN is defined by Shirvani
with two metrics. We refer to the first metric as “creative
thinking” (referred to as the openness facet in Table 1) and
the second as intellect (1 and 2 in Table 1, respectively).
Creative thinking is a variance value, as it is used to reward
using a diverse set of actions. The equation for creative
thinking is as follows:</p>
        <p>∑︀ (, )
Creative Thinking = 1 − =1...=1 ℎ()</p>
        <p>In this function, we assume the agent is considering 
possible diferent plans to take. The set of these plans is
[1...], so  is the i-th plan being considered. The value 
is a given action in one or more of these plans. Thus we can
think of the plans as sets of actions,  = [1...]. The value
 is the total number of actions that are possible for the
character to take. As for the larger values, Occurences(,
 ) is used as the number of times action  occurs in plan
 , while Length() is the number of steps in plan .</p>
        <p>The second metric that contributes to Openness is
Intellect. The Shirvani19 model defines this metric as the
probability that a plan succeeds. The probability of success of a
plan is defined as the likelihood of a plan succeeding based
on the number of conflicts created with other characters.
This was defined as such:</p>
        <p>Probability of Success = 1 −</p>
        <p>=1=1</p>
        <p>In this, the values of  and  represent the total number
of characters and the total number of possible actions
respectively. The value  represents character i out of the set
of all characters in the domain, and  represents a given
action  in the set of all  potential actions.   ()
is therefore a value that is 1 if action  causes a a conflict
∑︀ ∑︀  ()</p>
        <p>· 
Description
1.The minimum
action likelihood
in a plan (R)
2.Probability of
success of a plan
3.# of actions in a
plan (R)
4.# of times the
agent changes
their mind (R)
5.# of actions
with self as
the consenting
character
6.# of actions
including others
with their
consent
7.# of actions
including others
without their
consent
8.# of actions
including others
with their
consent
9.# of goals
achieved for
other characters
10.# of actions
including others
without their
consent (R)
11.# of conflicts
created for other
characters (R)
12.# of times the
agent changes
their mind
for character , and 0 if it does not. In short, the probability
of success is defined by how many other characters would
agree with the given character’s action plan.</p>
        <p>Both the Creative Thinking metric and the Intellect
metric share some issues in how they are calculated. In both
cases, the set of  plans [1...] demands that the program
collect a large collection of potential actions for every single
character’s potential plans. This works on the assumption
that the implementation of personality is done after the
planner has generated multiple plans, and assumes that
personality is simply used to collect the best possible plan. Such
a system is not feasible if the planner is intended to be used
for real-time story generation, or if the planner is working
with a human agent. It demands not only a large portion of
work be completed multiple times for every character on
every step, it also needs to have all or a large set number of
solutions generated for the metric to be collected.</p>
        <p>The Intellect metric is also problematic in that the
probability of success calculation relies on being able to calculate
whether an action would generate a conflict with another
character. We have already discussed the potential issues
with calculating conflict information in the previous section.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>4. Methods</title>
      <p>To make the personality model more flexible, we replaced
the problematic aspects of the Openness and Agreeableness
OCEAN metrics with values that could be collected more
easily. For Agreeableness we only needed to re-evaluate the
concept of conflict, but for Openness we propose alternative
calculations for both Creative Thinking and Intellect. We
will discuss each of these in greater detail below.</p>
      <sec id="sec-5-1">
        <title>4.1. Conflict</title>
        <p>In the Shirvani19 model, conflicts are calculated by
determining any point at which their plan could fail. While
this is a rigorous way to determine conflict, we propose
a metric that relaxes the idea of conflict in the interest of
making it easier to calculate. Instead of calculating conflict
so directly, we propose defining character conflict by the
character’s goals or other motivating factors. Rather than
simulating an entire world change for potential issues, we
argue that simply checking two states for comparison is
enough. Specifically, our metric compares one existing state
and one hypothetical state. The "true" state, 0 is the state at
the moment when the character is considering a plan, before
taking or deciding on an action, and is thus "true" because
it has come to pass outside of the character’s plans. The
hypothetical state is the predicted end state that will come
to pass if the character’s entire plan is executed without
fail, . In this we consider 1 to be the first action in the
plan the character is considering, with the considered plan
having a total of  steps in it.</p>
        <p>Thus, our changed definition of   () will need
to specify that  would result in the world state  if
executed. With this,   () is 1 if character  has a
higher goal metric at 0 than at  , and is 0 otherwise. In
other words, as long as action  moves the character, 
further from its goal, then   () will evaluate to
1.</p>
      </sec>
      <sec id="sec-5-2">
        <title>4.2. Creative Thinking</title>
        <p>Recall that to calculate Creative Thinking, the Shirvani19
model needs to calculate the variance associated with a
plan by calculating the minimum action likelihood in a plan
across many diferent plans being considered. To make
creative thinking easier to calculate, we propose to simply
examine the diversity of the actions considered in the plan.
We define  as the combined set of all actions
that have occurred up to the point at which an action is
being considered combined with the set of actions in the
most likely future plan. We then define  as
a set of size , with  being the total number of possible
actions in the domain. The set  holds the
number of times every given action in the domain is
executed throughout the entirety of the , and thus
can be calculated by going over the  just once.
In other words,  is the eventual count of
how many times every potential action would occur if the
given plan occurs without any interference or changes. We
calculated a variance-based metric that scales from 0 to 1.
Using an existing commonly used variance algorithm, we
applied a variance-based metric. In other words, we
measured openness to new experiences as the variance between
diferent kinds of action the character showed.</p>
      </sec>
      <sec id="sec-5-3">
        <title>4.3. Intellect</title>
        <p>Originally, Intellect was calculated as the probability of
success in a plan in terms of the number of conflicts that
it could create. We decided the easiest way to simplify the
problem is to cut out a unique metric entirely, and instead
use the “politeness" metric (10 in Table 1) for two values
and two purposes.</p>
        <p>Politeness is a metric that is calculated by determining
the number of actions that include other characters without
them consenting to the action. The first way we utilize the
Politeness metric is in its originally intended way. That is
to say, it is used to help calculate the Agreeableness of an
agent where the smaller the number of actions taken that
include nonconsenting characters, the larger the Politeness
metric.</p>
        <p>The new way we propose to use the Politeness metric is to
apply it to the concept of "Opposing Forces" in the sense that
it estimates how much opposition the character would need
to overcome to ensure the plan operates smoothly. In other
words, we consider the plan more likely to succeed- and
therefore more intelligent- based on the number of agents
that would oppose the plan by not consenting to take part
in certain actions.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>5. Experiments</title>
      <p>
        To evaluate the quality of our proposed metrics, we ran a
human subjects experiment. We attempted to run our
experiments as closely as possible to the experiments performed
in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], thus our experiments were focused on whether or not
an audience reading our generated stories observed the
intended personality traits assigned to a given character. Since
our metrics only afect 2 of the OCEAN traits, we limit our
experiments to the following four basic treatments
involving the OCEAN traits Openness and Agreeableness: High
Openness (HO), Low Openness (LO), High Agreeableness
(HA), and Low Agreeableness (LA). The experiment was
set up as a between-subjects experiment where participants
were randomly sorted into one of these four groups.
      </p>
      <sec id="sec-6-1">
        <title>5.1. Story Domain and Story Generation</title>
        <p>Shirvani and Ware unfortunately did not keep track of their
original program, thus we were unable to use the exact same
domain as they did. For our story experiments, we emulated
the domain used in the 2019 work as closely as possible.
Thus, we generated stories about a boy named Tom, whose
goal is to gain some herbs for his sick grandmother. The
herbs are in the possession of a Merchant, whose goal is to
gain a coin- which Tom happens to have. The Merchant is
in the Town, while Tom is in the Forest. But there is also a
Bandit in the Forest, who also wants the coin. Any character
can walk from one location to another, any character can buy
an item from another by spending a coin, and any character
that holds a weapon can rob another character for any item
in their inventory. A character with a weapon can also kill
another character, and any living character can loot the
corpse of a dead character for any items they hold. There is
also a bandit camp, where there is a chest with a secondary
coin. Finally, there is a guard in Town, who has the unique
action to arrest the bandit, and whose goal is to arrest the
bandit. The three characters that hold weapons at the start
of the story are the bandit, the merchant, and the guard.</p>
        <p>For each of the treatments mentioned above we needed to
generate a total of four stories. One of these stories was the
“true” story. In these stories, Tom would take actions that
either ranked very highly in the Agreeableness or Openness
metrics as described above, or very lowly in those metrics,
depending on the category.</p>
        <p>Another story involved Tom displaying the opposite
personality to the one being tested. Thus, if the treatment
group was associated with Low Agreeableness, then this
story would involve Tom performing High Agreeableness
actions. This story is meant to be a bad fit for the character’s
"true" personality.</p>
        <p>The final two stories consisted of a story that had a
medium score in the given attribute being tested
according to our metrics (and, thus, did not display Tom as strong
exhibiting or not exhibiting the attribute in question) and a
story chosen at random.</p>
      </sec>
      <sec id="sec-6-2">
        <title>5.2. Experimental Methodology</title>
        <p>We had two hypotheses that we wanted to evaluate for our
experiments. The first was that when shown a story that
our model generated and claimed displays one of our target
personality traits in the main character, the audience will
identify the main character as someone who holds these
traits. The second hypothesis is that when shown a set of
stories that includes one tale that our model claimed also
displays the same target trait in a similar quantity in the
main character, the audience will identify that particular
story as the one they consider most realistic.</p>
        <p>For these experiments, subjects were first given a brief
description of the domain, introducing the people, the places,
and the goals of the characters. They were then shown the
four generated stories described above and told that all were
possible ways that the story could proceed. One of these
stories would be the "true" story wherein the target
character’s behavior closely matched the personality the subject’s
category was testing for. The domains for these stories were
Agreeableness
Agreeableness (R)
Agreeableness (R)
identical. In total, there were eight stories shown to
subjects testing Openness and eight diferent stories shown to
subjects testing Agreeableness.</p>
        <p>After reading this, they were then told that one of the
stories was the "true" story and were then asked to rate
statements about the target character using a 5-point Likert
scale. We tried to use the same statements as Shirvani and
Ware, however preliminary testing showed that the framing
device for the story was interfering with the results. To be
specific, the domain in which the story takes place features
the target character of Tom, trying to get herbs for his sick
grandmother. We attempted to mention in the explanation
that Tom’s grandmother provides for him, thus implying a
potential selfish motive for Tom’s behavior, but the majority
of results in our initial testing showed high Agreeableness
regardless of Tom’s behavior in the story. Thus, we modified
the statements slightly so that the participants would give
answers based only on the parts of the story that our model
had generated rather than the backstory. The statements
we used are shown in Table 2. It should be noted that there
is no "grandmother" character included in any story domain
used, as the character is a plot device and cannot take any
actions during the story.</p>
        <p>While most statements presented to the user were
related to the specific OCEAN category we were testing, we
included a few statements related to diferent OCEAN
qualities as well. This was done to avoid having the subjects
ifxate too heavily on the general theme behind the questions
and to encourage them to think about the entire story in
their responses. These statements were not used for
analyzing the target metric of the category. The statements
tested for both HO and LO were the same, as were the
statements tested for HA and LA. Table 2 contains the statements
presented to the subjects for Agreeableness tests, and
Table 3 contains the statements presented to the subjects for
Openness tests. It should be noted that some questions were
meant to reflect a low score in the given metric, not a high
one. Ones marked with an (R) for "Reverse" were expected
to be agreed with if the "true" story ranked the character as
having a low value in the given metric.</p>
        <p>After rating these statements, subjects were then shown
four more stories and asked which one they thought most
likely to occur based on the target character’s personality.
In order to increase variability in stories, the second set</p>
        <p>Question
of stories were generated from two domains that were
almost entirely identical to the original domain. These two
domains had exactly one additional feature each; one added
the location of a Bandit’s Camp where a coin could be found,
and the other included an additional Guard character whose
goal was to arrest the Bandit. The subjects were told that
these additional four stories included the bandit camp/the
guard, so they would still understand the limitations of the
world. No other changes were made to the domains. Similar
to the stories shown previously, one of these four stories
shown to the audience was ranked by the model as
portraying a personality close to the "true" personality of the target
character, one was ranked lowly, one was ranked medium,
and the final one was a randomly chosen story. Subjects
were then asked which story they thought most closely fit
the given character’s personality.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>6. Results</title>
      <p>In this section we will review the results of our experiments
on identifying the protagonist personality traits and
choosing stories that align with the protagonist’s personality type.
For these experiments, we collected results for 176 subjects
using Prolific, with each subject randomly assigned to one of
the four treatment categories. The category with the
smallest number of subjects was Low Openness, which had 35
subjects. The category with the highest number of subjects
was High Openness, with 48 subjects.</p>
      <sec id="sec-7-1">
        <title>6.1. Identifying Protagonist Personality</title>
      </sec>
      <sec id="sec-7-2">
        <title>Traits</title>
        <p>Recall that our first hypothesis was that participants should
be able to identify if Tom exhibits either high or low
Agreeableness or high or low Openness depending on the
treatment group. To do this, we evaluated each user’s responses
to the statements related to their treatment group. For each
statement related to the aspect of personality we were
analyzing, we aligned the statements with High-attribute, or
Low-attribute implications, ie "Reverse" implications to the
statement. For the subjects that fell into a high-categories,
we considered it a success if the subjects ranked non-Reverse
statements with "Strongly Agree" or "Agree," and Reverse
statements with "Disagree" or "Strongly Disagree." Likewise,
for the low-categories success was determined if the subjects
ranked non-Reverse statements as "Disagree" or "Strongly
Disagree" and ranked Reverse statements as "Agree" or
"Strongly Agree."</p>
        <p>To determine if there was a significant efect, we used
a binomial exact test, testing the distribution of observed
successes and failures against a null hypothesis of users
providing random responses to each statement. The results
of this analysis are summarized in Tables 4 and 6 under the
heading “5-Pt Likert Scale.” Table 4 contains information on
each individual treatment, and Table 6 contains results if
treatments were aggregated based on either Agreeableness
or Openness.</p>
        <p>The binomial tests indicate that people are able to
correctly identify when the protagonist of the story exhibits
high agreeableness (p = 3.941e-13) and low agreeableness
(p=5.218e-41). We did not observe overwhelming evidence
that participants could identify when Tom exhibited either
high or low openness. When taken in aggregate, however,
we did find significant diferences between how users would
respond in both the Openness and Agreeableness categories.
These results mostly agree with the results obtained by
Shirvani and Ware.</p>
      </sec>
      <sec id="sec-7-3">
        <title>6.2. Choosing Stories According to</title>
      </sec>
      <sec id="sec-7-4">
        <title>Personality Type</title>
        <p>The second hypothesis we test is the idea that when
subjects choose a story that they feel best fits the character’s
personality, they will choose the one that our model claims
is closest to the original "true" story in personality.</p>
        <p>As before, we use a binomial exact test to analyze whether
participants select the correct story more frequently than
the null hypothesis of random story selection. Binomial
tests on our story selection experiments indicate that
participants are able to identify stories where Tom exhibits high
openness (p=4.412e-08), high agreeableness (p=2.584e-3),
and low agreeableness (p=1.752e-09).</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>7. Discussion</title>
      <p>By simplifying and reconstructing the metrics used in the
Shirvani19 model, we could provide a personality
implementation framework that is easily applied to a wide variety
of projects. Having a small-scale framework for
personality in narrative planning could be used to enhance other
O
C
E
A
N</p>
      <p>Story Selection
p-value Efect</p>
      <p>Size
projects, or as a personality framework with which to test
personality-adjacent features. Checking that a supposedly
multi-personality feature relating to say memory or
character beliefs actually works with multiple personality systems
requires having access to other systems of personality to
use.</p>
      <p>While our work has managed to adapt the Agreeableness
metrics to an acceptable degree, we had much less success
with Openness. One possible cause is that calculating
intellect by calculating the opposition to the character’s plans
weighs too closely to Agreeableness. It is also possible that
our variance metric for openness punishes plans where the
character happens to take the same kind of action
regardless of whether the action is the smartest thing to do. It’s
also possible that in stories where one character takes few
actions compared to other agents, the variance score for
openness sees this as showing more variety in the
character’s actions simply because the character may not have
repeated the same type of action, even if this story shows
the character as non-proactive. Alternative approaches to
Openness might find more luck in the future, or
alternative data-collecting information might enable calculating
Shirvani19’s openness metric without issue.</p>
      <p>It should also be noted that unlike the original experiment
set, in the case of Agreeableness our story selection did
much worse than our Likert scale tests, which is the opposite
of what Shirvani and Ware found. One explanation could
be that we didn’t account for any personality metrics apart
from the target score, and thus there were other factors that
the audience considered more pertinent than we did. Our
attempts to make them consider the story from multiple
perspectives may have increased the efect if that is the
case.</p>
      <p>If we were to test this hypothesis in the future, we would
need to expand our generation of alternative stories to check
all personality values, not just the targeted ones, and select
our "nearest-fit" stories to be ones where the model claims
the target character shows a moderate personality in all
aspects except for the aspect being tested.</p>
      <p>Another possible cause is that the subjects might have
selected most likely stories based on the actions of other
characters outside of our target character, instead of
focusing on Tom’s behavior alone. Although the original four set
of stories shown portray the various characters apart from
Tom acting in various diferent ways, some readers might
have still attributed their personalities in their selection of
most-likely stories. This could be corrected in future studies
by simply replacing the names of these characters in the
second set of stories, so that the audience views them as
diferent. Alternatively, another story domain with a single
character present could be used for future testing.</p>
    </sec>
    <sec id="sec-9">
      <title>8. Conclusion</title>
      <p>In our attempts to refine the 2019 OCEAN-based personality
model into a format that can be applied to story generation
tasks as well as story evaluation tasks while still remaining a
small-scale easily implemented personality model, we have
had some successes and some failures. Our results indicate
that of the two OCEAN attribute metrics we sought to refine,
only Agreeableness has been properly adjusted into a format
that audiences will recognize. Our work on Openness needs
to be redefined, and one of the testing metrics we have used
should likely be refined as well before using it again.</p>
      <p>One problem is that we were focused too intensely on
translating the original metrics into runtime-calculable
forms, and as such did not reevaluate if alternative solutions
might work better. For a first attempt this is still a crucial
step to reach, but there are still clear problems. For example,
take the original concept of evaluating "intelligence" by way
of evaluating the likeliness of other characters opposing the
plan as a probability of success. Our adaptation was simply
using another measurement of the number of characters
likely to oppose the plan, but this results in punishing cases
where intelligent characters can act coldly, or manipulative
of others.</p>
      <p>Still, our work has helped progress towards a
personality model that may not be the most refined nor even the
most accurate, but could be applied easily and quickly to
any given narrative planner for character enhancement or
comparative study with other personality models. Testing a
baseline model for comparison is a practice seen in
countless scientific fields, and providing a model that can serve
as one for personality modeling would benefit many future
researchers.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>P.</given-names>
            <surname>Gervás</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Lönneker-Rodman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. C.</given-names>
            <surname>Meister</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Peinado</surname>
          </string-name>
          , Narrative models :
          <source>Narratology meets artificial intelligence</source>
          ,
          <year>2006</year>
          . URL: https://api. semanticscholar.org/CorpusID:89613631.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Imabuchi</surname>
          </string-name>
          , T. Ogata,
          <article-title>Story generation system based on propp theory as a mechanism in narrative generation system</article-title>
          ,
          <source>in: 2012 IEEE Fourth International Conference On Digital Game And Intelligent Toy Enhanced Learning</source>
          ,
          <year>2012</year>
          , pp.
          <fpage>165</fpage>
          -
          <lpage>167</lpage>
          . doi:
          <volume>10</volume>
          .1109/ DIGITEL.
          <year>2012</year>
          .
          <volume>47</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>F.</given-names>
            <surname>Peinado</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Gervás</surname>
          </string-name>
          ,
          <article-title>Creativity issues in plot generation (</article-title>
          <year>2005</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Shirvani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. G.</given-names>
            <surname>Ware</surname>
          </string-name>
          ,
          <article-title>A plan-based personality model for story characters</article-title>
          ,
          <source>in: AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>188</fpage>
          -
          <lpage>194</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>E. S.</surname>
          </string-name>
          de Lima,
          <string-name>
            <given-names>B.</given-names>
            <surname>Feijó</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. L.</given-names>
            <surname>Furtado</surname>
          </string-name>
          ,
          <article-title>Adaptive storytelling based on personality and preference modeling</article-title>
          ,
          <source>Entertainment Computing</source>
          <volume>34</volume>
          (
          <year>2020</year>
          )
          <article-title>100342</article-title>
          . URL: https://www.sciencedirect.com/science/article/ pii/S187595211930076X. doi:https://doi.org/10. 1016/j.entcom.
          <year>2020</year>
          .
          <volume>100342</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>P.</given-names>
            <surname>Tambwekar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dhuliawala</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. J.</given-names>
            <surname>Martin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mehta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Harrison</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. O.</given-names>
            <surname>Riedl</surname>
          </string-name>
          ,
          <article-title>Controllable neural story plot generation via reward shaping</article-title>
          ,
          <source>in: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-2019, International Joint Conferences on Artificial Intelligence Organization</source>
          ,
          <year>2019</year>
          . URL: http://dx.doi.org/10.24963/ijcai.
          <year>2019</year>
          /829. doi:
          <volume>10</volume>
          .24963/ijcai.
          <year>2019</year>
          /829.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Shirvani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. G.</given-names>
            <surname>Ware</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. J.</given-names>
            <surname>Baker</surname>
          </string-name>
          ,
          <article-title>Personality and emotion in strong-story narrative planning</article-title>
          ,
          <source>IEEE Transactions on Games</source>
          <volume>15</volume>
          (
          <year>2023</year>
          )
          <fpage>669</fpage>
          -
          <lpage>682</lpage>
          . doi:
          <volume>10</volume>
          . 1109/TG.
          <year>2022</year>
          .
          <volume>3227220</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>R.</given-names>
            <surname>Evans</surname>
          </string-name>
          , E. Short,
          <article-title>Versu-a simulationist storytelling system</article-title>
          ,
          <source>IEEE Transactions on Computational Intelligence and AI in Games</source>
          <volume>6</volume>
          (
          <year>2014</year>
          )
          <fpage>113</fpage>
          -
          <lpage>130</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <article-title>[9] e. a. McCoy, Joshua, Social story worlds with comme il faut</article-title>
          . (
          <year>2014</year>
          )
          <fpage>97</fpage>
          -
          <lpage>112</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J. C.</given-names>
            <surname>Bahamón</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. M.</given-names>
            <surname>Young</surname>
          </string-name>
          ,
          <article-title>A choice-based model of character personality in narrative</article-title>
          ,
          <source>in: Workshop on Computational Models of Narrative</source>
          ,
          <year>2012</year>
          , pp.
          <fpage>164</fpage>
          -
          <lpage>168</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>J. C.</given-names>
            <surname>Bahamón</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. M.</given-names>
            <surname>Young</surname>
          </string-name>
          ,
          <article-title>An empirical evaluation of a generative method for the expression of personality traits through action choice</article-title>
          ,
          <source>in: Proceedings of the Thirteenth AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, AIIDE'17</source>
          , AAAI Press,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>M.</given-names>
            <surname>Kreminski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dickinson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mateas</surname>
          </string-name>
          ,
          <string-name>
            <surname>N.</surname>
          </string-name>
          <article-title>WardripFruin, Why are we like this?: Exploring writing mechanics for an ai-augmented storytelling game</article-title>
          ,
          <source>in: International Conference on the Foundations of Digital Games (FDG '20)</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>H.</given-names>
            <surname>Rashkin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bosselut</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sap</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Knight</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Choi</surname>
          </string-name>
          ,
          <article-title>Modeling naive psychology of characters in simple commonsense stories</article-title>
          ,
          <year>2018</year>
          . URL: https://arxiv.org/abs/
          <year>1805</year>
          .06533. arXiv:
          <year>1805</year>
          .06533.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>D.</given-names>
            <surname>Sander</surname>
          </string-name>
          ,
          <article-title>Models of emotion: the afective neuroscience approach</article-title>
          , The Cambridge Handbook of Human Afective Neuroscience (
          <year>2013</year>
          )
          <fpage>5</fpage>
          -
          <lpage>53</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>S. G.</given-names>
            <surname>Ware</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Siler</surname>
          </string-name>
          ,
          <article-title>Sabre: A narrative planner supporting intention and deep theory of mind</article-title>
          ,
          <source>in: AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment</source>
          , volume
          <volume>17</volume>
          ,
          <year>2021</year>
          , pp.
          <fpage>99</fpage>
          -
          <lpage>106</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>C.</given-names>
            <surname>Deyoung</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Quilty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Peterson</surname>
          </string-name>
          ,
          <article-title>Between facets and domains: 10 aspects of the big five</article-title>
          ,
          <source>Journal of personality and social psychology 93</source>
          (
          <year>2007</year>
          )
          <fpage>880</fpage>
          -
          <lpage>96</lpage>
          . doi:
          <volume>10</volume>
          .1037/
          <fpage>0022</fpage>
          -
          <lpage>3514</lpage>
          .
          <year>93</year>
          .5.880.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>