Generalizing a Numeric Personality Metric for Narrative Planners
                         Elinor Rubin-McGregor, Brent Harrison
                         1
                             Department of Computer Science University of Kentucky, Davis Marksbury Building, 329 Rose Street, Lexington, KY 40506-0633 USA


                                           Abstract
                                           In the field of narrative planning, there are many different approaches to personality modeling. So many that overarching study of
                                           personality models themselves is beginning to form. But a subject as complex as personality demands complex modeling, which in turn
                                           makes it difficult to compare implementations or to test sub-features of personality systems intended to be globalized. By generalizing an
                                           existing five-number model personality system, we hope to provide an adaptable resource that can be used for enhancement, comparison,
                                           or simply providing a foundational basis to other personality models.


                         1. Introduction                                                                                                 Specifically, the Shirvani19 model uses metrics that de-
                                                                                                                                      mand an understanding of not only the current story plan,
                         The consideration of personality is a major step forward                                                     but all or a large number of hypothetical alternative story
                         in the field of narrative generation. Narrative generators                                                   plans. One such metric describes "creative thinking." This
                         have a multitude of applications, from training models to                                                    metric is calculated by checking how many times the spe-
                         video games and even organizational and strategic purposes.                                                  cific actions of a given character occur in a larger, preferably
                         Incorporating personality into narrative models is a subject                                                 all-alternative-plan-encompassing, set of alternative stories.
                         that has vexed many researchers for years, as personality is                                                 In addition, the paper uses the concept of "conflict" in its
                         such a complex and varied concept. Yet it is critical, for if                                                metrics for determining both agreeableness and intellect
                         we do not model personality, our narratives cannot consider                                                  but defines its measurement of conflict as any time a char-
                         ways in which behavior differs between different individuals.                                                acter can observe any way in which their plans can fail.
                         For narrative purposes alone, stories become more engaging                                                   This feature also requires knowledge that cannot easily be
                         if the audience can identify with the characters and see them                                                generated during story creation, as evaluating it requires
                         as reflections of real people. Without personality considered,                                               essentially finishing the story in multiple ways before the
                         it is far more difficult to display narrative elements known                                                 story is even concluded. In short, there are features of the
                         to entice audiences such as character depth. Two people in                                                   Shirvani19 model that can only be used to evaluate personal-
                         the same situation will make different choices depending                                                     ity after several stories have already been generated, which
                         on who they are, and attempting to capture that concept of                                                   in turn makes the model difficult to use if we want to apply
                         "who they are" has been the pursuit of many.                                                                 it during story generation.
                             Currently, there are a wide variety of different unique                                                     To this end, we are proposing to modify the Shirvani19
                         personality models proposed for this purpose, with varying                                                   model such that it can be applied to a wider variety of nar-
                         advantages and disadvantages. Many of these models, how-                                                     rative planners. We are also trying to simplify the overhead
                         ever, require a great deal of effort to implement because they                                               required to make the personality model work. Specifically,
                         rely on information that is difficult for narrative planners to                                              we propose to calculate a metric that describes "creative
                         collect. For systems where personality is the central focus,                                                 thinking" by comparing the diversity of actions only along
                         or where personality is an important element this may be an                                                  the specific plan, so that characters who utilize a broader
                         acceptable cost to pay. But what about when the program                                                      range of actions are considered to have a higher Openness
                         is not focused on developing a specific personality system,                                                  score than characters who repeatedly use the same actions.
                         but instead on features related to multiple personality adap-                                                Likewise conflict is redefined for both of its uses. Where it
                         tation systems? Or perhaps, when personality is required                                                     is applied for measuring a character’s affability, we instead
                         or beneficial but not the primary focus [1, 2]? What about                                                   check simply the number of ways a character’s actions could
                         simply having a baseline personality model to compare a                                                      directly harm other characters. Where conflict is applied to
                         more complex model to [3]? Having a small-scale easily                                                       intellect, we translate the chance of success to the chance
                         implementable personality model would be beneficial for                                                      that other characters will oppose the actions of the given
                         other researchers in this field.                                                                             character.
                             There is an existing personality model that does not re-                                                    In order to ensure that our proposed methods are usable,
                         quire a great deal of effort to collect, running on data that                                                we performed a user study wherein subjects evaluated the
                         many narrative planners can easily collect already. This is                                                  stories produced by our modified model. In the end we
                         the OCEAN-based personality model produced by Shirvani                                                       found that while our Agreeableness work seems to be very
                         and Ware 2019. For easy of understanding and brevity, we                                                     applicable, our re-definition of Openness will need some
                         refer to this model as Shirvani19. While this model does                                                     refinement in later work.
                         utilize data that is generally available to narrative planners,
                         it does have limitations associated with it. For example, this
                         model is not entirely open to all domains and has some fea-                                                  2. Related Work
                         tures that cannot be calculated by a computer during story
                         generation.                                                                                                  There is a large amount of work on representing personality
                                                                                                                                      in digital narratives, even work that focuses on the Big Five
                                                                                                                                      OCEAN framework, but not many that are very modular
                          AIIDE Workshop on Intelligent Narrative Technologies, November 18, 2024,
                                                                                                                                      [5, 6]. Shirvani came out with a follow-up to Shirvani19
                          University of Kentucky Lexington, KY, USA
                          $ erru227@uky.edu (E. Rubin-McGregor); bha286@g.uky.edu                                                     that addressed the issues discussed here, but at the cost of
                          (B. Harrison)                                                                                               increasing the size of the model [7].
                                       © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License
                                       Attribution 4.0 International (CC BY 4.0).
                                                                                                                                         The well-known Versu drama manager is very good at

CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
telling complex stories with consistent character personality,             to complete tasks is considered to have low consci-
but it requires a great deal of overhead work to run [8]. The              entiousness.
model needs files representing the world, social practices,              • Extraversion is the degree to which a person wants
and the characters. Not only that but it needs parser pro-                 to engage and interact with other people. Notably
grams for all of these features, then initialization functions,            a highly extraverted person can also be very mali-
then a database to hold it all, and multiple levels of instantia-          cious, as this category does not differentiate between
tors before it can make a decision. Likewise, the Comme il                 positive or negative engagement with others, only
Faut project handles complex emotional environments very                   frequency.
well, but it requires a great deal of information provided to            • Agreeableness is reflective of compassion and em-
the data manager for any story domain to work[9]. Informa-                 pathy, and is used to measure how much a person
tion on cultural knowledge, social facts, social states, social            considers other people. Like Extraversion, someone
exchanges and even more must be documented in order for                    can be very shy and have high agreeableness.
them to be applied.                                                      • Neuroticism is a more internal emotional feature, as
   There are of course works that focus less on societal im-               it describes essentially how nervous and insecure a
pacts as a whole, and more on the individual characters.                   person is. Highly neurotic people will often struggle
Bahamon and Young introduced other OCEAN-based sys-                        with self-esteem, and emotional instability is often
tems, but they have not produced a way to directly evaluate                linked to high levels of neuroticism.
the OCEAN traits during runtime without extensive prepa-
ration. Their earlier work in 2012 provides a way to remove            The Shirvani19 model is primarily focused on scoring
actions deemed out-of-character during story generation,            the actions of a character according to how those actions
but does not provide a mechanism to determine whether               relate to these attributes. That is to say, it estimates what
behavior is out-of-character or not. It is a model we would         personality traits are being displayed in a given character’s
like to use to test our own work on in the future [10]. Their       actions, and to what degree each action displays those traits.
later work further discusses evaluating personality consis-            They do this by calculating twelve variables that are each
tency in narrative models, but still does not introduce a           used to contribute to a score describing a different OCEAN
personality model to use [11].                                      attribute. A full table of these metrics and how they relate
   The drama manager from Why Are We Like This works                to each OCEAN attribute are listed in Table 1. Of note, any
well with the player’s actions and models character person-         value with a (R) in it means that the value is used to reduce
ality from player actions, but because of this it only works        the overall score, as it defines a facet that makes an action
for the specific high degree of player interaction used in          fit less into the given personality attribute.
the project [12]. It also uses abstract personality modeling,          Of these scores, two are not as easy to formulate as others.
rather than a personality system that can work as soon as it        Agreeableness and Openness utilize metrics that are difficult
is applied. There has also been work by Soares that models          to obtain during story generation. We will discuss these
the personality of the player for narrative decisions, but it       metrics in greater detail below.
does not model the characters of the narrative in the same
way [5]. Shirvani and Ware developed a very impressive
emotion-based personality model that solves many of the
                                                                    3.1. Agreeableness
same problems this paper seeks to correct [7]. This model           As can be seen in Table 1, the Agreeableness OCEAN quality
relies upon its emotional system heavily in modeling per-           contains 4 metrics associated with it. One of these metrics,
sonality, which in turn requires a larger amount of overhead        (11 in Table 1), requires the planner to be able to calculate
and thus isn’t as modular as this paper seeks to be. Its re-        the number of conflicts created for other characters. The
liance on its emotional system also prevents it from being          Shirvani19 definition for conflict can be problematic for
used with various other emotion-focused models [13, 14].            efficient calculation. Shirvani19 defines character conflict
                                                                    as occurring when a character can foresee any way their
                                                                    plan can go wrong and fail to reach their goal. This element
3. Background on the Shirvani19                                     is extremely difficult to evaluate in many systems, as it
   Model                                                            requires simulating all possible alternative actions or events
                                                                    that could happen, not simply the actions they intend to
Shirvani and Ware proposed a personality model [4] for              have happen. This would take a large amount of operational
characters in a computational narrative that was based on           time and resources to run, as well as require limits or ways
the OCEAN model of personality, using Sabre as the basis            to determine when to stop simulating additional possible
of its planning model [15]. The OCEAN personality model,            future plans.
or "the big Five model," utilizes five key attributes to collec-
tively describe personality: Openness, Conscientiousness,
                                                                    3.2. Openness
Extraversion, Agreeableness, and Neuroticism [16]. These
are commonly accepted attributes of personality, and are            The Openness attribute of OCEAN is defined by Shirvani
defined as such:                                                    with two metrics. We refer to the first metric as “creative
                                                                    thinking” (referred to as the openness facet in Table 1) and
     • Openness means "openness to experience" and de-              the second as intellect (1 and 2 in Table 1, respectively).
       scribes how much a person is willing to explore              Creative thinking is a variance value, as it is used to reward
       outside of their comfort zone. This feature is also          using a diverse set of actions. The equation for creative
       considered an aspect of curiosity, and therefore is          thinking is as follows:
       often tied to creativity as well                                                                  𝑛
                                                                                                            𝑂𝑐𝑐𝑢𝑟𝑒𝑛𝑐𝑒𝑠(𝑎𝑖 ,𝑝𝑗 )
                                                                       Creative Thinking = 1 − 𝑚𝑖𝑛
                                                                                                        ∑︀
     • Conscientiousness is how organized and effective a                                        𝑖=1...𝑚𝑗=1      𝐿𝑒𝑛𝑔𝑡ℎ(𝑝𝑖 )

       person is. Someone who acts carelessly or struggles
  OCEAN Quality         Facet                 Description            for character 𝑐𝑖 , and 0 if it does not. In short, the probability
  Openness              Openness              1.The minimum          of success is defined by how many other characters would
                                              action likelihood      agree with the given character’s action plan.
                                              in a plan (R)             Both the Creative Thinking metric and the Intellect met-
                        Intellect             2.Probability of
                                                                     ric share some issues in how they are calculated. In both
                                              success of a plan
                                                                     cases, the set of 𝑛 plans [𝑝1 ...𝑝𝑛 ] demands that the program
  Conscientiousness Industriousness           3.# of actions in a
                    and Orderliness           plan (R)               collect a large collection of potential actions for every single
                                              4.# of times the       character’s potential plans. This works on the assumption
                                              agent changes          that the implementation of personality is done after the
                                              their mind (R)         planner has generated multiple plans, and assumes that per-
                                              5.# of actions         sonality is simply used to collect the best possible plan. Such
                                              with self as           a system is not feasible if the planner is intended to be used
                                              the consenting         for real-time story generation, or if the planner is working
                                              character              with a human agent. It demands not only a large portion of
  Extraversion          Enthusiasm            6.# of actions         work be completed multiple times for every character on
                                              including others
                                                                     every step, it also needs to have all or a large set number of
                                              with their con-
                                              sent
                                                                     solutions generated for the metric to be collected.
                        Assertiveness         7.# of actions            The Intellect metric is also problematic in that the proba-
                                              including others       bility of success calculation relies on being able to calculate
                                              without      their     whether an action would generate a conflict with another
                                              consent                character. We have already discussed the potential issues
  Agreeableness         Compassion            8.# of actions         with calculating conflict information in the previous section.
                                              including others
                                              with their con-
                                              sent                   4. Methods
                                              9.# of goals
                                              achieved        for    To make the personality model more flexible, we replaced
                                              other characters       the problematic aspects of the Openness and Agreeableness
                        Politeness            10.# of actions        OCEAN metrics with values that could be collected more
                                              including others       easily. For Agreeableness we only needed to re-evaluate the
                                              without      their     concept of conflict, but for Openness we propose alternative
                                              consent (R)            calculations for both Creative Thinking and Intellect. We
                                              11.# of conflicts
                                                                     will discuss each of these in greater detail below.
                                              created for other
                                              characters (R)
  Neuroticism           Withdrawal and        12.# of times the      4.1. Conflict
                        Volatility            agent changes
                                              their mind             In the Shirvani19 model, conflicts are calculated by deter-
                                                                     mining any point at which their plan could fail. While
Table 1                                                              this is a rigorous way to determine conflict, we propose
Shirvani19’s Metrics of Personality for the OCEAN personality        a metric that relaxes the idea of conflict in the interest of
model [4].
                                                                     making it easier to calculate. Instead of calculating conflict
                                                                     so directly, we propose defining character conflict by the
                                                                     character’s goals or other motivating factors. Rather than
   In this function, we assume the agent is considering 𝑛            simulating an entire world change for potential issues, we
possible different plans to take. The set of these plans is          argue that simply checking two states for comparison is
[𝑝1 ...𝑝𝑛 ], so 𝑝𝑖 is the i-th plan being considered. The value 𝑎𝑖   enough. Specifically, our metric compares one existing state
is a given action in one or more of these plans. Thus we can         and one hypothetical state. The "true" state, 𝑡0 is the state at
think of the plans as sets of actions, 𝑝𝑖 = [𝑎1 ...𝑎𝑚 ]. The value   the moment when the character is considering a plan, before
𝑚 is the total number of actions that are possible for the           taking or deciding on an action, and is thus "true" because
character to take. As for the larger values, Occurences(𝑎𝑖 ,         it has come to pass outside of the character’s plans. The
𝑝𝑗 ) is used as the number of times action 𝑎𝑖 occurs in plan         hypothetical state is the predicted end state that will come
𝑝𝑗 , while Length(𝑝𝑖 ) is the number of steps in plan 𝑝𝑖 .           to pass if the character’s entire plan is executed without
   The second metric that contributes to Openness is Intel-          fail, 𝑡𝑛 . In this we consider 𝑡1 to be the first action in the
lect. The Shirvani19 model defines this metric as the proba-         plan the character is considering, with the considered plan
bility that a plan succeeds. The probability of success of a         having a total of 𝑛 steps in it.
plan is defined as the likelihood of a plan succeeding based            Thus, our changed definition of 𝐶𝑜𝑛𝑓 𝑙𝑖𝑐𝑡𝑎𝑗 (𝑐𝑖 ) will need
on the number of conflicts created with other characters.            to specify that 𝑎𝑗 would result in the world state 𝑡𝑗 if exe-
This was defined as such:                                            cuted. With this, 𝐶𝑜𝑛𝑓 𝑙𝑖𝑐𝑡𝑎𝑗 (𝑐𝑖 ) is 1 if character 𝑐𝑖 has a
                                        𝑛 ∑︀
                                           𝑚 𝐶𝑜𝑛𝑓 𝑙𝑖𝑐𝑡 (𝑐 )
                                                      𝑎𝑗  𝑖
   Probability of Success = 1 −                                      higher goal metric at 𝑡0 than at 𝑡𝑗 , and is 0 otherwise. In
                                       ∑︀
                                                  𝑛·𝑚
                                     𝑖=1𝑗=1
                                                                     other words, as long as action 𝑎𝑗 moves the character, 𝑐𝑖
   In this, the values of 𝑛 and 𝑚 represent the total number
                                                                     further from its goal, then 𝐶𝑜𝑛𝑓 𝑙𝑖𝑐𝑡𝑎𝑗 (𝑐𝑖 ) will evaluate to
of characters and the total number of possible actions re-
                                                                     1.
spectively. The value 𝑐𝑖 represents character i out of the set
of all characters in the domain, and 𝑎𝑗 represents a given ac-
tion 𝑎𝑗 in the set of all 𝑚 potential actions. 𝐶𝑜𝑛𝑓 𝑙𝑖𝑐𝑡𝑎𝑗 (𝑐𝑖 )
is therefore a value that is 1 if action 𝑎𝑗 causes a a conflict
4.2. Creative Thinking                                           set up as a between-subjects experiment where participants
                                                                 were randomly sorted into one of these four groups.
Recall that to calculate Creative Thinking, the Shirvani19
model needs to calculate the variance associated with a
plan by calculating the minimum action likelihood in a plan      5.1. Story Domain and Story Generation
across many different plans being considered. To make            Shirvani and Ware unfortunately did not keep track of their
creative thinking easier to calculate, we propose to simply      original program, thus we were unable to use the exact same
examine the diversity of the actions considered in the plan.     domain as they did. For our story experiments, we emulated
We define 𝑝𝑙𝑎𝑛𝑛𝑒𝑑𝑆𝑒𝑡 as the combined set of all actions          the domain used in the 2019 work as closely as possible.
that have occurred up to the point at which an action is         Thus, we generated stories about a boy named Tom, whose
being considered combined with the set of actions in the         goal is to gain some herbs for his sick grandmother. The
most likely future plan. We then define 𝑎𝑐𝑡𝑖𝑜𝑛𝐶𝑜𝑢𝑛𝑡𝑒𝑟 as         herbs are in the possession of a Merchant, whose goal is to
a set of size 𝑚, with 𝑚 being the total number of possible       gain a coin- which Tom happens to have. The Merchant is
actions in the domain. The set 𝑎𝑐𝑡𝑖𝑜𝑛𝐶𝑜𝑢𝑛𝑡𝑒𝑟 holds the           in the Town, while Tom is in the Forest. But there is also a
number of times every given action in the domain is exe-         Bandit in the Forest, who also wants the coin. Any character
cuted throughout the entirety of the 𝑝𝑙𝑎𝑛𝑛𝑒𝑑𝑆𝑒𝑡, and thus        can walk from one location to another, any character can buy
can be calculated by going over the 𝑝𝑙𝑎𝑛𝑛𝑒𝑑𝑆𝑒𝑡 just once.        an item from another by spending a coin, and any character
In other words, 𝑎𝑐𝑡𝑖𝑜𝑛𝐶𝑜𝑢𝑛𝑡𝑒𝑟 is the eventual count of           that holds a weapon can rob another character for any item
how many times every potential action would occur if the         in their inventory. A character with a weapon can also kill
given plan occurs without any interference or changes. We        another character, and any living character can loot the
calculated a variance-based metric that scales from 0 to 1.      corpse of a dead character for any items they hold. There is
Using an existing commonly used variance algorithm, we           also a bandit camp, where there is a chest with a secondary
applied a variance-based metric. In other words, we mea-         coin. Finally, there is a guard in Town, who has the unique
sured openness to new experiences as the variance between        action to arrest the bandit, and whose goal is to arrest the
different kinds of action the character showed.                  bandit. The three characters that hold weapons at the start
                                                                 of the story are the bandit, the merchant, and the guard.
4.3. Intellect                                                      For each of the treatments mentioned above we needed to
                                                                 generate a total of four stories. One of these stories was the
Originally, Intellect was calculated as the probability of
                                                                 “true” story. In these stories, Tom would take actions that
success in a plan in terms of the number of conflicts that
                                                                 either ranked very highly in the Agreeableness or Openness
it could create. We decided the easiest way to simplify the
                                                                 metrics as described above, or very lowly in those metrics,
problem is to cut out a unique metric entirely, and instead
                                                                 depending on the category.
use the “politeness" metric (10 in Table 1) for two values
                                                                    Another story involved Tom displaying the opposite per-
and two purposes.
                                                                 sonality to the one being tested. Thus, if the treatment
   Politeness is a metric that is calculated by determining
                                                                 group was associated with Low Agreeableness, then this
the number of actions that include other characters without
                                                                 story would involve Tom performing High Agreeableness
them consenting to the action. The first way we utilize the
                                                                 actions. This story is meant to be a bad fit for the character’s
Politeness metric is in its originally intended way. That is
                                                                 "true" personality.
to say, it is used to help calculate the Agreeableness of an
                                                                    The final two stories consisted of a story that had a
agent where the smaller the number of actions taken that
                                                                 medium score in the given attribute being tested accord-
include nonconsenting characters, the larger the Politeness
                                                                 ing to our metrics (and, thus, did not display Tom as strong
metric.
                                                                 exhibiting or not exhibiting the attribute in question) and a
   The new way we propose to use the Politeness metric is to
                                                                 story chosen at random.
apply it to the concept of "Opposing Forces" in the sense that
it estimates how much opposition the character would need
to overcome to ensure the plan operates smoothly. In other       5.2. Experimental Methodology
words, we consider the plan more likely to succeed- and          We had two hypotheses that we wanted to evaluate for our
therefore more intelligent- based on the number of agents        experiments. The first was that when shown a story that
that would oppose the plan by not consenting to take part        our model generated and claimed displays one of our target
in certain actions.                                              personality traits in the main character, the audience will
                                                                 identify the main character as someone who holds these
5. Experiments                                                   traits. The second hypothesis is that when shown a set of
                                                                 stories that includes one tale that our model claimed also
To evaluate the quality of our proposed metrics, we ran a        displays the same target trait in a similar quantity in the
human subjects experiment. We attempted to run our exper-        main character, the audience will identify that particular
iments as closely as possible to the experiments performed       story as the one they consider most realistic.
in [4], thus our experiments were focused on whether or not         For these experiments, subjects were first given a brief de-
an audience reading our generated stories observed the in-       scription of the domain, introducing the people, the places,
tended personality traits assigned to a given character. Since   and the goals of the characters. They were then shown the
our metrics only affect 2 of the OCEAN traits, we limit our      four generated stories described above and told that all were
experiments to the following four basic treatments involv-       possible ways that the story could proceed. One of these
ing the OCEAN traits Openness and Agreeableness: High            stories would be the "true" story wherein the target charac-
Openness (HO), Low Openness (LO), High Agreeableness             ter’s behavior closely matched the personality the subject’s
(HA), and Low Agreeableness (LA). The experiment was             category was testing for. The domains for these stories were
  OCEAN Quality           Question                                OCEAN Quality           Question
  Agreeableness           Tom avoids conflict.                    Openness                Tom finds creative solutions to
  Agreeableness (R)       Tom takes advantage of others.                                  problems.
  Agreeableness (R)       Tom is out for his own personal         Openness                Tom tends to analyze possible out-
                          gain, with his grandmother as the                               comes of his plans.
                          only exception.                         Openness (R)            Tom has difficulty coming up with
  Agreeableness           Tom likes to do things for others                               excellent plans.
                          as well as his grandmother.             Openness                Tom has excellent ideas.
  Agreeableness (R)       Tom can’t be bothered with other’s      Openness (R)            Tom’s ideas are ordinary and
                          needs (unless they are his grand-                               hardly unique.
                          mother).                                Extraversion            Tom finds it difficult to approach
  Extraversion            Tom feels comfortable around peo-                               others.
                          ple.                                    Conscientiousness       Tom gets things done quickly.
  Neuroticism             Tom does things that he later re-       Neuroticism             Tom changes his mood a lot.
                          grets.
  Conscientiousness       Tom makes plans and sticks to         Table 3
                          them.                                 Statements Evaluated for testing Openness

Table 2
Statements Evaluated for testing Agreeableness
                                                                of stories were generated from two domains that were al-
                                                                most entirely identical to the original domain. These two
                                                                domains had exactly one additional feature each; one added
identical. In total, there were eight stories shown to sub-     the location of a Bandit’s Camp where a coin could be found,
jects testing Openness and eight different stories shown to     and the other included an additional Guard character whose
subjects testing Agreeableness.                                 goal was to arrest the Bandit. The subjects were told that
   After reading this, they were then told that one of the      these additional four stories included the bandit camp/the
stories was the "true" story and were then asked to rate        guard, so they would still understand the limitations of the
statements about the target character using a 5-point Likert    world. No other changes were made to the domains. Similar
scale. We tried to use the same statements as Shirvani and      to the stories shown previously, one of these four stories
Ware, however preliminary testing showed that the framing       shown to the audience was ranked by the model as portray-
device for the story was interfering with the results. To be    ing a personality close to the "true" personality of the target
specific, the domain in which the story takes place features    character, one was ranked lowly, one was ranked medium,
the target character of Tom, trying to get herbs for his sick   and the final one was a randomly chosen story. Subjects
grandmother. We attempted to mention in the explanation         were then asked which story they thought most closely fit
that Tom’s grandmother provides for him, thus implying a        the given character’s personality.
potential selfish motive for Tom’s behavior, but the majority
of results in our initial testing showed high Agreeableness
regardless of Tom’s behavior in the story. Thus, we modified    6. Results
the statements slightly so that the participants would give
answers based only on the parts of the story that our model     In this section we will review the results of our experiments
had generated rather than the backstory. The statements         on identifying the protagonist personality traits and choos-
we used are shown in Table 2. It should be noted that there     ing stories that align with the protagonist’s personality type.
is no "grandmother" character included in any story domain      For these experiments, we collected results for 176 subjects
used, as the character is a plot device and cannot take any     using Prolific, with each subject randomly assigned to one of
actions during the story.                                       the four treatment categories. The category with the small-
   While most statements presented to the user were re-         est number of subjects was Low Openness, which had 35
lated to the specific OCEAN category we were testing, we        subjects. The category with the highest number of subjects
included a few statements related to different OCEAN qual-      was High Openness, with 48 subjects.
ities as well. This was done to avoid having the subjects
fixate too heavily on the general theme behind the questions    6.1. Identifying Protagonist Personality
and to encourage them to think about the entire story in             Traits
their responses. These statements were not used for ana-
lyzing the target metric of the category. The statements        Recall that our first hypothesis was that participants should
tested for both HO and LO were the same, as were the state-     be able to identify if Tom exhibits either high or low Agree-
ments tested for HA and LA. Table 2 contains the statements     ableness or high or low Openness depending on the treat-
presented to the subjects for Agreeableness tests, and Ta-      ment group. To do this, we evaluated each user’s responses
ble 3 contains the statements presented to the subjects for     to the statements related to their treatment group. For each
Openness tests. It should be noted that some questions were     statement related to the aspect of personality we were an-
meant to reflect a low score in the given metric, not a high    alyzing, we aligned the statements with High-attribute, or
one. Ones marked with an (R) for "Reverse" were expected        Low-attribute implications, ie "Reverse" implications to the
to be agreed with if the "true" story ranked the character as   statement. For the subjects that fell into a high-categories,
having a low value in the given metric.                         we considered it a success if the subjects ranked non-Reverse
   After rating these statements, subjects were then shown      statements with "Strongly Agree" or "Agree," and Reverse
four more stories and asked which one they thought most         statements with "Disagree" or "Strongly Disagree." Likewise,
likely to occur based on the target character’s personality.    for the low-categories success was determined if the subjects
In order to increase variability in stories, the second set     ranked non-Reverse statements as "Disagree" or "Strongly
           5-Pt Likert Scale        Story Selection                         5-Pt Likert Scale         Story Selection
           p-value      Effect      p-value      Effect                     p-value      Effect       p-value      Effect
                        Size                     Size                                    Size                      Size
   HO      0.869        0.367       4.412e-08    0.625              O       0.072        1.160        0.026        1.60
   LO      0.998        0.303       0.999        0.057              C       0.016        1.160        0.001        1.73
   HA      3.941e-13    0.634       2.584e-3     0.447              E       0.024        1.167        0.014        1.61
   LA      5.218e-41    0.830       1.752e-09    0.674              A       0.048        1.167        <0.001       2.80
                                                                    N       0.063        1.128        0.002        2.04
Table 4
Experiment Results Individually                                  Table 6
                                                                 Shirvani’s Results
           5-Pt Likert Scale        Story Selection
           p-value      Effect      p-value      Effect
                        Size                     Size            projects, or as a personality framework with which to test
   O       0.988        0.347       4.395e3      0.385           personality-adjacent features. Checking that a supposedly
   A       1.818e-47    0.731       2.138e-10    0.559           multi-personality feature relating to say memory or charac-
                                                                 ter beliefs actually works with multiple personality systems
Table 5                                                          requires having access to other systems of personality to
Experiment Results Combined
                                                                 use.
                                                                    While our work has managed to adapt the Agreeableness
                                                                 metrics to an acceptable degree, we had much less success
Disagree" and ranked Reverse statements as "Agree" or            with Openness. One possible cause is that calculating intel-
"Strongly Agree."                                                lect by calculating the opposition to the character’s plans
   To determine if there was a significant effect, we used       weighs too closely to Agreeableness. It is also possible that
a binomial exact test, testing the distribution of observed      our variance metric for openness punishes plans where the
successes and failures against a null hypothesis of users        character happens to take the same kind of action regard-
providing random responses to each statement. The results        less of whether the action is the smartest thing to do. It’s
of this analysis are summarized in Tables 4 and 6 under the      also possible that in stories where one character takes few
heading “5-Pt Likert Scale.” Table 4 contains information on     actions compared to other agents, the variance score for
each individual treatment, and Table 6 contains results if       openness sees this as showing more variety in the charac-
treatments were aggregated based on either Agreeableness         ter’s actions simply because the character may not have
or Openness.                                                     repeated the same type of action, even if this story shows
   The binomial tests indicate that people are able to cor-      the character as non-proactive. Alternative approaches to
rectly identify when the protagonist of the story exhibits       Openness might find more luck in the future, or alterna-
high agreeableness (p = 3.941e-13) and low agreeableness         tive data-collecting information might enable calculating
(p=5.218e-41). We did not observe overwhelming evidence          Shirvani19’s openness metric without issue.
that participants could identify when Tom exhibited either          It should also be noted that unlike the original experiment
high or low openness. When taken in aggregate, however,          set, in the case of Agreeableness our story selection did
we did find significant differences between how users would      much worse than our Likert scale tests, which is the opposite
respond in both the Openness and Agreeableness categories.       of what Shirvani and Ware found. One explanation could
These results mostly agree with the results obtained by Shir-    be that we didn’t account for any personality metrics apart
vani and Ware.                                                   from the target score, and thus there were other factors that
                                                                 the audience considered more pertinent than we did. Our
6.2. Choosing Stories According to                               attempts to make them consider the story from multiple
     Personality Type                                            perspectives may have increased the effect if that is the
                                                                 case.
The second hypothesis we test is the idea that when sub-            If we were to test this hypothesis in the future, we would
jects choose a story that they feel best fits the character’s    need to expand our generation of alternative stories to check
personality, they will choose the one that our model claims      all personality values, not just the targeted ones, and select
is closest to the original "true" story in personality.          our "nearest-fit" stories to be ones where the model claims
   As before, we use a binomial exact test to analyze whether    the target character shows a moderate personality in all
participants select the correct story more frequently than       aspects except for the aspect being tested.
the null hypothesis of random story selection. Binomial             Another possible cause is that the subjects might have
tests on our story selection experiments indicate that partic-   selected most likely stories based on the actions of other
ipants are able to identify stories where Tom exhibits high      characters outside of our target character, instead of focus-
openness (p=4.412e-08), high agreeableness (p=2.584e-3),         ing on Tom’s behavior alone. Although the original four set
and low agreeableness (p=1.752e-09).                             of stories shown portray the various characters apart from
                                                                 Tom acting in various different ways, some readers might
                                                                 have still attributed their personalities in their selection of
7. Discussion                                                    most-likely stories. This could be corrected in future studies
                                                                 by simply replacing the names of these characters in the
By simplifying and reconstructing the metrics used in the
                                                                 second set of stories, so that the audience views them as
Shirvani19 model, we could provide a personality imple-
                                                                 different. Alternatively, another story domain with a single
mentation framework that is easily applied to a wide variety
                                                                 character present could be used for future testing.
of projects. Having a small-scale framework for personal-
ity in narrative planning could be used to enhance other
8. Conclusion                                                          Conferences on Artificial Intelligence Organization,
                                                                       2019. URL: http://dx.doi.org/10.24963/ijcai.2019/829.
In our attempts to refine the 2019 OCEAN-based personality             doi:10.24963/ijcai.2019/829.
model into a format that can be applied to story generation        [7] A. Shirvani, S. G. Ware, L. J. Baker, Personality and
tasks as well as story evaluation tasks while still remaining a        emotion in strong-story narrative planning, IEEE
small-scale easily implemented personality model, we have              Transactions on Games 15 (2023) 669–682. doi:10.
had some successes and some failures. Our results indicate             1109/TG.2022.3227220.
that of the two OCEAN attribute metrics we sought to refine,       [8] R. Evans, E. Short, Versu—a simulationist storytelling
only Agreeableness has been properly adjusted into a format            system, IEEE Transactions on Computational Intelli-
that audiences will recognize. Our work on Openness needs              gence and AI in Games 6 (2014) 113–130.
to be redefined, and one of the testing metrics we have used       [9] e. a. McCoy, Joshua, Social story worlds with comme
should likely be refined as well before using it again.                il faut. (2014) 97–112.
   One problem is that we were focused too intensely on           [10] J. C. Bahamón, R. M. Young, A choice-based model of
translating the original metrics into runtime-calculable               character personality in narrative, in: Workshop on
forms, and as such did not reevaluate if alternative solutions         Computational Models of Narrative, 2012, pp. 164–168.
might work better. For a first attempt this is still a crucial    [11] J. C. Bahamón, R. M. Young, An empirical evaluation
step to reach, but there are still clear problems. For example,        of a generative method for the expression of person-
take the original concept of evaluating "intelligence" by way          ality traits through action choice, in: Proceedings of
of evaluating the likeliness of other characters opposing the          the Thirteenth AAAI Conference on Artificial Intelli-
plan as a probability of success. Our adaptation was simply            gence and Interactive Digital Entertainment, AIIDE’17,
using another measurement of the number of characters                  AAAI Press, 2017.
likely to oppose the plan, but this results in punishing cases    [12] M. Kreminski, M. Dickinson, M. Mateas, N. Wardrip-
where intelligent characters can act coldly, or manipulative           Fruin, Why are we like this?: Exploring writing me-
of others.                                                             chanics for an ai-augmented storytelling game, in:
   Still, our work has helped progress towards a personal-             International Conference on the Foundations of Digi-
ity model that may not be the most refined nor even the                tal Games (FDG ’20), 2020.
most accurate, but could be applied easily and quickly to         [13] H. Rashkin, A. Bosselut, M. Sap, K. Knight, Y. Choi,
any given narrative planner for character enhancement or               Modeling naive psychology of characters in simple
comparative study with other personality models. Testing a             commonsense stories, 2018. URL: https://arxiv.org/abs/
baseline model for comparison is a practice seen in count-             1805.06533. arXiv:1805.06533.
less scientific fields, and providing a model that can serve      [14] D. Sander, Models of emotion: the affective neuro-
as one for personality modeling would benefit many future              science approach, The Cambridge Handbook of Hu-
researchers.                                                           man Affective Neuroscience (2013) 5–53.
                                                                  [15] S. G. Ware, C. Siler, Sabre: A narrative planner sup-
                                                                       porting intention and deep theory of mind, in: AAAI
                                                                       Conference on Artificial Intelligence and Interactive
References                                                             Digital Entertainment, volume 17, 2021, pp. 99–106.
                                                                  [16] C. Deyoung, L. Quilty, J. Peterson, Between facets
 [1] P. Gervás, B. Lönneker-Rodman, J. C. Meister,
                                                                       and domains: 10 aspects of the big five, Journal of
     F. Peinado,         Narrative models : Narratology
                                                                       personality and social psychology 93 (2007) 880–96.
     meets artificial intelligence, 2006. URL: https://api.
                                                                       doi:10.1037/0022-3514.93.5.880.
     semanticscholar.org/CorpusID:89613631.
 [2] S. Imabuchi, T. Ogata, Story generation system based
     on propp theory as a mechanism in narrative gen-
     eration system, in: 2012 IEEE Fourth International
     Conference On Digital Game And Intelligent Toy En-
     hanced Learning, 2012, pp. 165–167. doi:10.1109/
     DIGITEL.2012.47.
 [3] F. Peinado, P. Gervás, Creativity issues in plot genera-
     tion (2005).
 [4] A. Shirvani, S. G. Ware, A plan-based personality
     model for story characters, in: AAAI Conference on
     Artificial Intelligence and Interactive Digital Entertain-
     ment, 2019, pp. 188–194.
 [5] E. S. de Lima, B. Feijó, A. L. Furtado, Adaptive sto-
     rytelling based on personality and preference mod-
     eling, Entertainment Computing 34 (2020) 100342.
     URL: https://www.sciencedirect.com/science/article/
     pii/S187595211930076X. doi:https://doi.org/10.
     1016/j.entcom.2020.100342.
 [6] P. Tambwekar, M. Dhuliawala, L. J. Martin, A. Mehta,
     B. Harrison, M. O. Riedl, Controllable neural story
     plot generation via reward shaping, in: Proceedings of
     the Twenty-Eighth International Joint Conference on
     Artificial Intelligence, IJCAI-2019, International Joint