=Paper=
{{Paper
|id=Vol-3847/paper1
|storemode=property
|title=Generalizing a Numeric Personality Metric for Narrative Planners
|pdfUrl=https://ceur-ws.org/Vol-3847/paper1.pdf
|volume=Vol-3847
|authors=Elinor Rubin-McGregor,Brent Harrison
|dblpUrl=https://dblp.org/rec/conf/int-ws/Rubin-McGregorH24
}}
==Generalizing a Numeric Personality Metric for Narrative Planners==
Generalizing a Numeric Personality Metric for Narrative Planners
Elinor Rubin-McGregor, Brent Harrison
1
Department of Computer Science University of Kentucky, Davis Marksbury Building, 329 Rose Street, Lexington, KY 40506-0633 USA
Abstract
In the field of narrative planning, there are many different approaches to personality modeling. So many that overarching study of
personality models themselves is beginning to form. But a subject as complex as personality demands complex modeling, which in turn
makes it difficult to compare implementations or to test sub-features of personality systems intended to be globalized. By generalizing an
existing five-number model personality system, we hope to provide an adaptable resource that can be used for enhancement, comparison,
or simply providing a foundational basis to other personality models.
1. Introduction Specifically, the Shirvani19 model uses metrics that de-
mand an understanding of not only the current story plan,
The consideration of personality is a major step forward but all or a large number of hypothetical alternative story
in the field of narrative generation. Narrative generators plans. One such metric describes "creative thinking." This
have a multitude of applications, from training models to metric is calculated by checking how many times the spe-
video games and even organizational and strategic purposes. cific actions of a given character occur in a larger, preferably
Incorporating personality into narrative models is a subject all-alternative-plan-encompassing, set of alternative stories.
that has vexed many researchers for years, as personality is In addition, the paper uses the concept of "conflict" in its
such a complex and varied concept. Yet it is critical, for if metrics for determining both agreeableness and intellect
we do not model personality, our narratives cannot consider but defines its measurement of conflict as any time a char-
ways in which behavior differs between different individuals. acter can observe any way in which their plans can fail.
For narrative purposes alone, stories become more engaging This feature also requires knowledge that cannot easily be
if the audience can identify with the characters and see them generated during story creation, as evaluating it requires
as reflections of real people. Without personality considered, essentially finishing the story in multiple ways before the
it is far more difficult to display narrative elements known story is even concluded. In short, there are features of the
to entice audiences such as character depth. Two people in Shirvani19 model that can only be used to evaluate personal-
the same situation will make different choices depending ity after several stories have already been generated, which
on who they are, and attempting to capture that concept of in turn makes the model difficult to use if we want to apply
"who they are" has been the pursuit of many. it during story generation.
Currently, there are a wide variety of different unique To this end, we are proposing to modify the Shirvani19
personality models proposed for this purpose, with varying model such that it can be applied to a wider variety of nar-
advantages and disadvantages. Many of these models, how- rative planners. We are also trying to simplify the overhead
ever, require a great deal of effort to implement because they required to make the personality model work. Specifically,
rely on information that is difficult for narrative planners to we propose to calculate a metric that describes "creative
collect. For systems where personality is the central focus, thinking" by comparing the diversity of actions only along
or where personality is an important element this may be an the specific plan, so that characters who utilize a broader
acceptable cost to pay. But what about when the program range of actions are considered to have a higher Openness
is not focused on developing a specific personality system, score than characters who repeatedly use the same actions.
but instead on features related to multiple personality adap- Likewise conflict is redefined for both of its uses. Where it
tation systems? Or perhaps, when personality is required is applied for measuring a character’s affability, we instead
or beneficial but not the primary focus [1, 2]? What about check simply the number of ways a character’s actions could
simply having a baseline personality model to compare a directly harm other characters. Where conflict is applied to
more complex model to [3]? Having a small-scale easily intellect, we translate the chance of success to the chance
implementable personality model would be beneficial for that other characters will oppose the actions of the given
other researchers in this field. character.
There is an existing personality model that does not re- In order to ensure that our proposed methods are usable,
quire a great deal of effort to collect, running on data that we performed a user study wherein subjects evaluated the
many narrative planners can easily collect already. This is stories produced by our modified model. In the end we
the OCEAN-based personality model produced by Shirvani found that while our Agreeableness work seems to be very
and Ware 2019. For easy of understanding and brevity, we applicable, our re-definition of Openness will need some
refer to this model as Shirvani19. While this model does refinement in later work.
utilize data that is generally available to narrative planners,
it does have limitations associated with it. For example, this
model is not entirely open to all domains and has some fea- 2. Related Work
tures that cannot be calculated by a computer during story
generation. There is a large amount of work on representing personality
in digital narratives, even work that focuses on the Big Five
OCEAN framework, but not many that are very modular
AIIDE Workshop on Intelligent Narrative Technologies, November 18, 2024,
[5, 6]. Shirvani came out with a follow-up to Shirvani19
University of Kentucky Lexington, KY, USA
$ erru227@uky.edu (E. Rubin-McGregor); bha286@g.uky.edu that addressed the issues discussed here, but at the cost of
(B. Harrison) increasing the size of the model [7].
© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License
Attribution 4.0 International (CC BY 4.0).
The well-known Versu drama manager is very good at
CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
telling complex stories with consistent character personality, to complete tasks is considered to have low consci-
but it requires a great deal of overhead work to run [8]. The entiousness.
model needs files representing the world, social practices, • Extraversion is the degree to which a person wants
and the characters. Not only that but it needs parser pro- to engage and interact with other people. Notably
grams for all of these features, then initialization functions, a highly extraverted person can also be very mali-
then a database to hold it all, and multiple levels of instantia- cious, as this category does not differentiate between
tors before it can make a decision. Likewise, the Comme il positive or negative engagement with others, only
Faut project handles complex emotional environments very frequency.
well, but it requires a great deal of information provided to • Agreeableness is reflective of compassion and em-
the data manager for any story domain to work[9]. Informa- pathy, and is used to measure how much a person
tion on cultural knowledge, social facts, social states, social considers other people. Like Extraversion, someone
exchanges and even more must be documented in order for can be very shy and have high agreeableness.
them to be applied. • Neuroticism is a more internal emotional feature, as
There are of course works that focus less on societal im- it describes essentially how nervous and insecure a
pacts as a whole, and more on the individual characters. person is. Highly neurotic people will often struggle
Bahamon and Young introduced other OCEAN-based sys- with self-esteem, and emotional instability is often
tems, but they have not produced a way to directly evaluate linked to high levels of neuroticism.
the OCEAN traits during runtime without extensive prepa-
ration. Their earlier work in 2012 provides a way to remove The Shirvani19 model is primarily focused on scoring
actions deemed out-of-character during story generation, the actions of a character according to how those actions
but does not provide a mechanism to determine whether relate to these attributes. That is to say, it estimates what
behavior is out-of-character or not. It is a model we would personality traits are being displayed in a given character’s
like to use to test our own work on in the future [10]. Their actions, and to what degree each action displays those traits.
later work further discusses evaluating personality consis- They do this by calculating twelve variables that are each
tency in narrative models, but still does not introduce a used to contribute to a score describing a different OCEAN
personality model to use [11]. attribute. A full table of these metrics and how they relate
The drama manager from Why Are We Like This works to each OCEAN attribute are listed in Table 1. Of note, any
well with the player’s actions and models character person- value with a (R) in it means that the value is used to reduce
ality from player actions, but because of this it only works the overall score, as it defines a facet that makes an action
for the specific high degree of player interaction used in fit less into the given personality attribute.
the project [12]. It also uses abstract personality modeling, Of these scores, two are not as easy to formulate as others.
rather than a personality system that can work as soon as it Agreeableness and Openness utilize metrics that are difficult
is applied. There has also been work by Soares that models to obtain during story generation. We will discuss these
the personality of the player for narrative decisions, but it metrics in greater detail below.
does not model the characters of the narrative in the same
way [5]. Shirvani and Ware developed a very impressive
emotion-based personality model that solves many of the
3.1. Agreeableness
same problems this paper seeks to correct [7]. This model As can be seen in Table 1, the Agreeableness OCEAN quality
relies upon its emotional system heavily in modeling per- contains 4 metrics associated with it. One of these metrics,
sonality, which in turn requires a larger amount of overhead (11 in Table 1), requires the planner to be able to calculate
and thus isn’t as modular as this paper seeks to be. Its re- the number of conflicts created for other characters. The
liance on its emotional system also prevents it from being Shirvani19 definition for conflict can be problematic for
used with various other emotion-focused models [13, 14]. efficient calculation. Shirvani19 defines character conflict
as occurring when a character can foresee any way their
plan can go wrong and fail to reach their goal. This element
3. Background on the Shirvani19 is extremely difficult to evaluate in many systems, as it
Model requires simulating all possible alternative actions or events
that could happen, not simply the actions they intend to
Shirvani and Ware proposed a personality model [4] for have happen. This would take a large amount of operational
characters in a computational narrative that was based on time and resources to run, as well as require limits or ways
the OCEAN model of personality, using Sabre as the basis to determine when to stop simulating additional possible
of its planning model [15]. The OCEAN personality model, future plans.
or "the big Five model," utilizes five key attributes to collec-
tively describe personality: Openness, Conscientiousness,
3.2. Openness
Extraversion, Agreeableness, and Neuroticism [16]. These
are commonly accepted attributes of personality, and are The Openness attribute of OCEAN is defined by Shirvani
defined as such: with two metrics. We refer to the first metric as “creative
thinking” (referred to as the openness facet in Table 1) and
• Openness means "openness to experience" and de- the second as intellect (1 and 2 in Table 1, respectively).
scribes how much a person is willing to explore Creative thinking is a variance value, as it is used to reward
outside of their comfort zone. This feature is also using a diverse set of actions. The equation for creative
considered an aspect of curiosity, and therefore is thinking is as follows:
often tied to creativity as well 𝑛
𝑂𝑐𝑐𝑢𝑟𝑒𝑛𝑐𝑒𝑠(𝑎𝑖 ,𝑝𝑗 )
Creative Thinking = 1 − 𝑚𝑖𝑛
∑︀
• Conscientiousness is how organized and effective a 𝑖=1...𝑚𝑗=1 𝐿𝑒𝑛𝑔𝑡ℎ(𝑝𝑖 )
person is. Someone who acts carelessly or struggles
OCEAN Quality Facet Description for character 𝑐𝑖 , and 0 if it does not. In short, the probability
Openness Openness 1.The minimum of success is defined by how many other characters would
action likelihood agree with the given character’s action plan.
in a plan (R) Both the Creative Thinking metric and the Intellect met-
Intellect 2.Probability of
ric share some issues in how they are calculated. In both
success of a plan
cases, the set of 𝑛 plans [𝑝1 ...𝑝𝑛 ] demands that the program
Conscientiousness Industriousness 3.# of actions in a
and Orderliness plan (R) collect a large collection of potential actions for every single
4.# of times the character’s potential plans. This works on the assumption
agent changes that the implementation of personality is done after the
their mind (R) planner has generated multiple plans, and assumes that per-
5.# of actions sonality is simply used to collect the best possible plan. Such
with self as a system is not feasible if the planner is intended to be used
the consenting for real-time story generation, or if the planner is working
character with a human agent. It demands not only a large portion of
Extraversion Enthusiasm 6.# of actions work be completed multiple times for every character on
including others
every step, it also needs to have all or a large set number of
with their con-
sent
solutions generated for the metric to be collected.
Assertiveness 7.# of actions The Intellect metric is also problematic in that the proba-
including others bility of success calculation relies on being able to calculate
without their whether an action would generate a conflict with another
consent character. We have already discussed the potential issues
Agreeableness Compassion 8.# of actions with calculating conflict information in the previous section.
including others
with their con-
sent 4. Methods
9.# of goals
achieved for To make the personality model more flexible, we replaced
other characters the problematic aspects of the Openness and Agreeableness
Politeness 10.# of actions OCEAN metrics with values that could be collected more
including others easily. For Agreeableness we only needed to re-evaluate the
without their concept of conflict, but for Openness we propose alternative
consent (R) calculations for both Creative Thinking and Intellect. We
11.# of conflicts
will discuss each of these in greater detail below.
created for other
characters (R)
Neuroticism Withdrawal and 12.# of times the 4.1. Conflict
Volatility agent changes
their mind In the Shirvani19 model, conflicts are calculated by deter-
mining any point at which their plan could fail. While
Table 1 this is a rigorous way to determine conflict, we propose
Shirvani19’s Metrics of Personality for the OCEAN personality a metric that relaxes the idea of conflict in the interest of
model [4].
making it easier to calculate. Instead of calculating conflict
so directly, we propose defining character conflict by the
character’s goals or other motivating factors. Rather than
In this function, we assume the agent is considering 𝑛 simulating an entire world change for potential issues, we
possible different plans to take. The set of these plans is argue that simply checking two states for comparison is
[𝑝1 ...𝑝𝑛 ], so 𝑝𝑖 is the i-th plan being considered. The value 𝑎𝑖 enough. Specifically, our metric compares one existing state
is a given action in one or more of these plans. Thus we can and one hypothetical state. The "true" state, 𝑡0 is the state at
think of the plans as sets of actions, 𝑝𝑖 = [𝑎1 ...𝑎𝑚 ]. The value the moment when the character is considering a plan, before
𝑚 is the total number of actions that are possible for the taking or deciding on an action, and is thus "true" because
character to take. As for the larger values, Occurences(𝑎𝑖 , it has come to pass outside of the character’s plans. The
𝑝𝑗 ) is used as the number of times action 𝑎𝑖 occurs in plan hypothetical state is the predicted end state that will come
𝑝𝑗 , while Length(𝑝𝑖 ) is the number of steps in plan 𝑝𝑖 . to pass if the character’s entire plan is executed without
The second metric that contributes to Openness is Intel- fail, 𝑡𝑛 . In this we consider 𝑡1 to be the first action in the
lect. The Shirvani19 model defines this metric as the proba- plan the character is considering, with the considered plan
bility that a plan succeeds. The probability of success of a having a total of 𝑛 steps in it.
plan is defined as the likelihood of a plan succeeding based Thus, our changed definition of 𝐶𝑜𝑛𝑓 𝑙𝑖𝑐𝑡𝑎𝑗 (𝑐𝑖 ) will need
on the number of conflicts created with other characters. to specify that 𝑎𝑗 would result in the world state 𝑡𝑗 if exe-
This was defined as such: cuted. With this, 𝐶𝑜𝑛𝑓 𝑙𝑖𝑐𝑡𝑎𝑗 (𝑐𝑖 ) is 1 if character 𝑐𝑖 has a
𝑛 ∑︀
𝑚 𝐶𝑜𝑛𝑓 𝑙𝑖𝑐𝑡 (𝑐 )
𝑎𝑗 𝑖
Probability of Success = 1 − higher goal metric at 𝑡0 than at 𝑡𝑗 , and is 0 otherwise. In
∑︀
𝑛·𝑚
𝑖=1𝑗=1
other words, as long as action 𝑎𝑗 moves the character, 𝑐𝑖
In this, the values of 𝑛 and 𝑚 represent the total number
further from its goal, then 𝐶𝑜𝑛𝑓 𝑙𝑖𝑐𝑡𝑎𝑗 (𝑐𝑖 ) will evaluate to
of characters and the total number of possible actions re-
1.
spectively. The value 𝑐𝑖 represents character i out of the set
of all characters in the domain, and 𝑎𝑗 represents a given ac-
tion 𝑎𝑗 in the set of all 𝑚 potential actions. 𝐶𝑜𝑛𝑓 𝑙𝑖𝑐𝑡𝑎𝑗 (𝑐𝑖 )
is therefore a value that is 1 if action 𝑎𝑗 causes a a conflict
4.2. Creative Thinking set up as a between-subjects experiment where participants
were randomly sorted into one of these four groups.
Recall that to calculate Creative Thinking, the Shirvani19
model needs to calculate the variance associated with a
plan by calculating the minimum action likelihood in a plan 5.1. Story Domain and Story Generation
across many different plans being considered. To make Shirvani and Ware unfortunately did not keep track of their
creative thinking easier to calculate, we propose to simply original program, thus we were unable to use the exact same
examine the diversity of the actions considered in the plan. domain as they did. For our story experiments, we emulated
We define 𝑝𝑙𝑎𝑛𝑛𝑒𝑑𝑆𝑒𝑡 as the combined set of all actions the domain used in the 2019 work as closely as possible.
that have occurred up to the point at which an action is Thus, we generated stories about a boy named Tom, whose
being considered combined with the set of actions in the goal is to gain some herbs for his sick grandmother. The
most likely future plan. We then define 𝑎𝑐𝑡𝑖𝑜𝑛𝐶𝑜𝑢𝑛𝑡𝑒𝑟 as herbs are in the possession of a Merchant, whose goal is to
a set of size 𝑚, with 𝑚 being the total number of possible gain a coin- which Tom happens to have. The Merchant is
actions in the domain. The set 𝑎𝑐𝑡𝑖𝑜𝑛𝐶𝑜𝑢𝑛𝑡𝑒𝑟 holds the in the Town, while Tom is in the Forest. But there is also a
number of times every given action in the domain is exe- Bandit in the Forest, who also wants the coin. Any character
cuted throughout the entirety of the 𝑝𝑙𝑎𝑛𝑛𝑒𝑑𝑆𝑒𝑡, and thus can walk from one location to another, any character can buy
can be calculated by going over the 𝑝𝑙𝑎𝑛𝑛𝑒𝑑𝑆𝑒𝑡 just once. an item from another by spending a coin, and any character
In other words, 𝑎𝑐𝑡𝑖𝑜𝑛𝐶𝑜𝑢𝑛𝑡𝑒𝑟 is the eventual count of that holds a weapon can rob another character for any item
how many times every potential action would occur if the in their inventory. A character with a weapon can also kill
given plan occurs without any interference or changes. We another character, and any living character can loot the
calculated a variance-based metric that scales from 0 to 1. corpse of a dead character for any items they hold. There is
Using an existing commonly used variance algorithm, we also a bandit camp, where there is a chest with a secondary
applied a variance-based metric. In other words, we mea- coin. Finally, there is a guard in Town, who has the unique
sured openness to new experiences as the variance between action to arrest the bandit, and whose goal is to arrest the
different kinds of action the character showed. bandit. The three characters that hold weapons at the start
of the story are the bandit, the merchant, and the guard.
4.3. Intellect For each of the treatments mentioned above we needed to
generate a total of four stories. One of these stories was the
Originally, Intellect was calculated as the probability of
“true” story. In these stories, Tom would take actions that
success in a plan in terms of the number of conflicts that
either ranked very highly in the Agreeableness or Openness
it could create. We decided the easiest way to simplify the
metrics as described above, or very lowly in those metrics,
problem is to cut out a unique metric entirely, and instead
depending on the category.
use the “politeness" metric (10 in Table 1) for two values
Another story involved Tom displaying the opposite per-
and two purposes.
sonality to the one being tested. Thus, if the treatment
Politeness is a metric that is calculated by determining
group was associated with Low Agreeableness, then this
the number of actions that include other characters without
story would involve Tom performing High Agreeableness
them consenting to the action. The first way we utilize the
actions. This story is meant to be a bad fit for the character’s
Politeness metric is in its originally intended way. That is
"true" personality.
to say, it is used to help calculate the Agreeableness of an
The final two stories consisted of a story that had a
agent where the smaller the number of actions taken that
medium score in the given attribute being tested accord-
include nonconsenting characters, the larger the Politeness
ing to our metrics (and, thus, did not display Tom as strong
metric.
exhibiting or not exhibiting the attribute in question) and a
The new way we propose to use the Politeness metric is to
story chosen at random.
apply it to the concept of "Opposing Forces" in the sense that
it estimates how much opposition the character would need
to overcome to ensure the plan operates smoothly. In other 5.2. Experimental Methodology
words, we consider the plan more likely to succeed- and We had two hypotheses that we wanted to evaluate for our
therefore more intelligent- based on the number of agents experiments. The first was that when shown a story that
that would oppose the plan by not consenting to take part our model generated and claimed displays one of our target
in certain actions. personality traits in the main character, the audience will
identify the main character as someone who holds these
5. Experiments traits. The second hypothesis is that when shown a set of
stories that includes one tale that our model claimed also
To evaluate the quality of our proposed metrics, we ran a displays the same target trait in a similar quantity in the
human subjects experiment. We attempted to run our exper- main character, the audience will identify that particular
iments as closely as possible to the experiments performed story as the one they consider most realistic.
in [4], thus our experiments were focused on whether or not For these experiments, subjects were first given a brief de-
an audience reading our generated stories observed the in- scription of the domain, introducing the people, the places,
tended personality traits assigned to a given character. Since and the goals of the characters. They were then shown the
our metrics only affect 2 of the OCEAN traits, we limit our four generated stories described above and told that all were
experiments to the following four basic treatments involv- possible ways that the story could proceed. One of these
ing the OCEAN traits Openness and Agreeableness: High stories would be the "true" story wherein the target charac-
Openness (HO), Low Openness (LO), High Agreeableness ter’s behavior closely matched the personality the subject’s
(HA), and Low Agreeableness (LA). The experiment was category was testing for. The domains for these stories were
OCEAN Quality Question OCEAN Quality Question
Agreeableness Tom avoids conflict. Openness Tom finds creative solutions to
Agreeableness (R) Tom takes advantage of others. problems.
Agreeableness (R) Tom is out for his own personal Openness Tom tends to analyze possible out-
gain, with his grandmother as the comes of his plans.
only exception. Openness (R) Tom has difficulty coming up with
Agreeableness Tom likes to do things for others excellent plans.
as well as his grandmother. Openness Tom has excellent ideas.
Agreeableness (R) Tom can’t be bothered with other’s Openness (R) Tom’s ideas are ordinary and
needs (unless they are his grand- hardly unique.
mother). Extraversion Tom finds it difficult to approach
Extraversion Tom feels comfortable around peo- others.
ple. Conscientiousness Tom gets things done quickly.
Neuroticism Tom does things that he later re- Neuroticism Tom changes his mood a lot.
grets.
Conscientiousness Tom makes plans and sticks to Table 3
them. Statements Evaluated for testing Openness
Table 2
Statements Evaluated for testing Agreeableness
of stories were generated from two domains that were al-
most entirely identical to the original domain. These two
domains had exactly one additional feature each; one added
identical. In total, there were eight stories shown to sub- the location of a Bandit’s Camp where a coin could be found,
jects testing Openness and eight different stories shown to and the other included an additional Guard character whose
subjects testing Agreeableness. goal was to arrest the Bandit. The subjects were told that
After reading this, they were then told that one of the these additional four stories included the bandit camp/the
stories was the "true" story and were then asked to rate guard, so they would still understand the limitations of the
statements about the target character using a 5-point Likert world. No other changes were made to the domains. Similar
scale. We tried to use the same statements as Shirvani and to the stories shown previously, one of these four stories
Ware, however preliminary testing showed that the framing shown to the audience was ranked by the model as portray-
device for the story was interfering with the results. To be ing a personality close to the "true" personality of the target
specific, the domain in which the story takes place features character, one was ranked lowly, one was ranked medium,
the target character of Tom, trying to get herbs for his sick and the final one was a randomly chosen story. Subjects
grandmother. We attempted to mention in the explanation were then asked which story they thought most closely fit
that Tom’s grandmother provides for him, thus implying a the given character’s personality.
potential selfish motive for Tom’s behavior, but the majority
of results in our initial testing showed high Agreeableness
regardless of Tom’s behavior in the story. Thus, we modified 6. Results
the statements slightly so that the participants would give
answers based only on the parts of the story that our model In this section we will review the results of our experiments
had generated rather than the backstory. The statements on identifying the protagonist personality traits and choos-
we used are shown in Table 2. It should be noted that there ing stories that align with the protagonist’s personality type.
is no "grandmother" character included in any story domain For these experiments, we collected results for 176 subjects
used, as the character is a plot device and cannot take any using Prolific, with each subject randomly assigned to one of
actions during the story. the four treatment categories. The category with the small-
While most statements presented to the user were re- est number of subjects was Low Openness, which had 35
lated to the specific OCEAN category we were testing, we subjects. The category with the highest number of subjects
included a few statements related to different OCEAN qual- was High Openness, with 48 subjects.
ities as well. This was done to avoid having the subjects
fixate too heavily on the general theme behind the questions 6.1. Identifying Protagonist Personality
and to encourage them to think about the entire story in Traits
their responses. These statements were not used for ana-
lyzing the target metric of the category. The statements Recall that our first hypothesis was that participants should
tested for both HO and LO were the same, as were the state- be able to identify if Tom exhibits either high or low Agree-
ments tested for HA and LA. Table 2 contains the statements ableness or high or low Openness depending on the treat-
presented to the subjects for Agreeableness tests, and Ta- ment group. To do this, we evaluated each user’s responses
ble 3 contains the statements presented to the subjects for to the statements related to their treatment group. For each
Openness tests. It should be noted that some questions were statement related to the aspect of personality we were an-
meant to reflect a low score in the given metric, not a high alyzing, we aligned the statements with High-attribute, or
one. Ones marked with an (R) for "Reverse" were expected Low-attribute implications, ie "Reverse" implications to the
to be agreed with if the "true" story ranked the character as statement. For the subjects that fell into a high-categories,
having a low value in the given metric. we considered it a success if the subjects ranked non-Reverse
After rating these statements, subjects were then shown statements with "Strongly Agree" or "Agree," and Reverse
four more stories and asked which one they thought most statements with "Disagree" or "Strongly Disagree." Likewise,
likely to occur based on the target character’s personality. for the low-categories success was determined if the subjects
In order to increase variability in stories, the second set ranked non-Reverse statements as "Disagree" or "Strongly
5-Pt Likert Scale Story Selection 5-Pt Likert Scale Story Selection
p-value Effect p-value Effect p-value Effect p-value Effect
Size Size Size Size
HO 0.869 0.367 4.412e-08 0.625 O 0.072 1.160 0.026 1.60
LO 0.998 0.303 0.999 0.057 C 0.016 1.160 0.001 1.73
HA 3.941e-13 0.634 2.584e-3 0.447 E 0.024 1.167 0.014 1.61
LA 5.218e-41 0.830 1.752e-09 0.674 A 0.048 1.167 <0.001 2.80
N 0.063 1.128 0.002 2.04
Table 4
Experiment Results Individually Table 6
Shirvani’s Results
5-Pt Likert Scale Story Selection
p-value Effect p-value Effect
Size Size projects, or as a personality framework with which to test
O 0.988 0.347 4.395e3 0.385 personality-adjacent features. Checking that a supposedly
A 1.818e-47 0.731 2.138e-10 0.559 multi-personality feature relating to say memory or charac-
ter beliefs actually works with multiple personality systems
Table 5 requires having access to other systems of personality to
Experiment Results Combined
use.
While our work has managed to adapt the Agreeableness
metrics to an acceptable degree, we had much less success
Disagree" and ranked Reverse statements as "Agree" or with Openness. One possible cause is that calculating intel-
"Strongly Agree." lect by calculating the opposition to the character’s plans
To determine if there was a significant effect, we used weighs too closely to Agreeableness. It is also possible that
a binomial exact test, testing the distribution of observed our variance metric for openness punishes plans where the
successes and failures against a null hypothesis of users character happens to take the same kind of action regard-
providing random responses to each statement. The results less of whether the action is the smartest thing to do. It’s
of this analysis are summarized in Tables 4 and 6 under the also possible that in stories where one character takes few
heading “5-Pt Likert Scale.” Table 4 contains information on actions compared to other agents, the variance score for
each individual treatment, and Table 6 contains results if openness sees this as showing more variety in the charac-
treatments were aggregated based on either Agreeableness ter’s actions simply because the character may not have
or Openness. repeated the same type of action, even if this story shows
The binomial tests indicate that people are able to cor- the character as non-proactive. Alternative approaches to
rectly identify when the protagonist of the story exhibits Openness might find more luck in the future, or alterna-
high agreeableness (p = 3.941e-13) and low agreeableness tive data-collecting information might enable calculating
(p=5.218e-41). We did not observe overwhelming evidence Shirvani19’s openness metric without issue.
that participants could identify when Tom exhibited either It should also be noted that unlike the original experiment
high or low openness. When taken in aggregate, however, set, in the case of Agreeableness our story selection did
we did find significant differences between how users would much worse than our Likert scale tests, which is the opposite
respond in both the Openness and Agreeableness categories. of what Shirvani and Ware found. One explanation could
These results mostly agree with the results obtained by Shir- be that we didn’t account for any personality metrics apart
vani and Ware. from the target score, and thus there were other factors that
the audience considered more pertinent than we did. Our
6.2. Choosing Stories According to attempts to make them consider the story from multiple
Personality Type perspectives may have increased the effect if that is the
case.
The second hypothesis we test is the idea that when sub- If we were to test this hypothesis in the future, we would
jects choose a story that they feel best fits the character’s need to expand our generation of alternative stories to check
personality, they will choose the one that our model claims all personality values, not just the targeted ones, and select
is closest to the original "true" story in personality. our "nearest-fit" stories to be ones where the model claims
As before, we use a binomial exact test to analyze whether the target character shows a moderate personality in all
participants select the correct story more frequently than aspects except for the aspect being tested.
the null hypothesis of random story selection. Binomial Another possible cause is that the subjects might have
tests on our story selection experiments indicate that partic- selected most likely stories based on the actions of other
ipants are able to identify stories where Tom exhibits high characters outside of our target character, instead of focus-
openness (p=4.412e-08), high agreeableness (p=2.584e-3), ing on Tom’s behavior alone. Although the original four set
and low agreeableness (p=1.752e-09). of stories shown portray the various characters apart from
Tom acting in various different ways, some readers might
have still attributed their personalities in their selection of
7. Discussion most-likely stories. This could be corrected in future studies
by simply replacing the names of these characters in the
By simplifying and reconstructing the metrics used in the
second set of stories, so that the audience views them as
Shirvani19 model, we could provide a personality imple-
different. Alternatively, another story domain with a single
mentation framework that is easily applied to a wide variety
character present could be used for future testing.
of projects. Having a small-scale framework for personal-
ity in narrative planning could be used to enhance other
8. Conclusion Conferences on Artificial Intelligence Organization,
2019. URL: http://dx.doi.org/10.24963/ijcai.2019/829.
In our attempts to refine the 2019 OCEAN-based personality doi:10.24963/ijcai.2019/829.
model into a format that can be applied to story generation [7] A. Shirvani, S. G. Ware, L. J. Baker, Personality and
tasks as well as story evaluation tasks while still remaining a emotion in strong-story narrative planning, IEEE
small-scale easily implemented personality model, we have Transactions on Games 15 (2023) 669–682. doi:10.
had some successes and some failures. Our results indicate 1109/TG.2022.3227220.
that of the two OCEAN attribute metrics we sought to refine, [8] R. Evans, E. Short, Versu—a simulationist storytelling
only Agreeableness has been properly adjusted into a format system, IEEE Transactions on Computational Intelli-
that audiences will recognize. Our work on Openness needs gence and AI in Games 6 (2014) 113–130.
to be redefined, and one of the testing metrics we have used [9] e. a. McCoy, Joshua, Social story worlds with comme
should likely be refined as well before using it again. il faut. (2014) 97–112.
One problem is that we were focused too intensely on [10] J. C. Bahamón, R. M. Young, A choice-based model of
translating the original metrics into runtime-calculable character personality in narrative, in: Workshop on
forms, and as such did not reevaluate if alternative solutions Computational Models of Narrative, 2012, pp. 164–168.
might work better. For a first attempt this is still a crucial [11] J. C. Bahamón, R. M. Young, An empirical evaluation
step to reach, but there are still clear problems. For example, of a generative method for the expression of person-
take the original concept of evaluating "intelligence" by way ality traits through action choice, in: Proceedings of
of evaluating the likeliness of other characters opposing the the Thirteenth AAAI Conference on Artificial Intelli-
plan as a probability of success. Our adaptation was simply gence and Interactive Digital Entertainment, AIIDE’17,
using another measurement of the number of characters AAAI Press, 2017.
likely to oppose the plan, but this results in punishing cases [12] M. Kreminski, M. Dickinson, M. Mateas, N. Wardrip-
where intelligent characters can act coldly, or manipulative Fruin, Why are we like this?: Exploring writing me-
of others. chanics for an ai-augmented storytelling game, in:
Still, our work has helped progress towards a personal- International Conference on the Foundations of Digi-
ity model that may not be the most refined nor even the tal Games (FDG ’20), 2020.
most accurate, but could be applied easily and quickly to [13] H. Rashkin, A. Bosselut, M. Sap, K. Knight, Y. Choi,
any given narrative planner for character enhancement or Modeling naive psychology of characters in simple
comparative study with other personality models. Testing a commonsense stories, 2018. URL: https://arxiv.org/abs/
baseline model for comparison is a practice seen in count- 1805.06533. arXiv:1805.06533.
less scientific fields, and providing a model that can serve [14] D. Sander, Models of emotion: the affective neuro-
as one for personality modeling would benefit many future science approach, The Cambridge Handbook of Hu-
researchers. man Affective Neuroscience (2013) 5–53.
[15] S. G. Ware, C. Siler, Sabre: A narrative planner sup-
porting intention and deep theory of mind, in: AAAI
Conference on Artificial Intelligence and Interactive
References Digital Entertainment, volume 17, 2021, pp. 99–106.
[16] C. Deyoung, L. Quilty, J. Peterson, Between facets
[1] P. Gervás, B. Lönneker-Rodman, J. C. Meister,
and domains: 10 aspects of the big five, Journal of
F. Peinado, Narrative models : Narratology
personality and social psychology 93 (2007) 880–96.
meets artificial intelligence, 2006. URL: https://api.
doi:10.1037/0022-3514.93.5.880.
semanticscholar.org/CorpusID:89613631.
[2] S. Imabuchi, T. Ogata, Story generation system based
on propp theory as a mechanism in narrative gen-
eration system, in: 2012 IEEE Fourth International
Conference On Digital Game And Intelligent Toy En-
hanced Learning, 2012, pp. 165–167. doi:10.1109/
DIGITEL.2012.47.
[3] F. Peinado, P. Gervás, Creativity issues in plot genera-
tion (2005).
[4] A. Shirvani, S. G. Ware, A plan-based personality
model for story characters, in: AAAI Conference on
Artificial Intelligence and Interactive Digital Entertain-
ment, 2019, pp. 188–194.
[5] E. S. de Lima, B. Feijó, A. L. Furtado, Adaptive sto-
rytelling based on personality and preference mod-
eling, Entertainment Computing 34 (2020) 100342.
URL: https://www.sciencedirect.com/science/article/
pii/S187595211930076X. doi:https://doi.org/10.
1016/j.entcom.2020.100342.
[6] P. Tambwekar, M. Dhuliawala, L. J. Martin, A. Mehta,
B. Harrison, M. O. Riedl, Controllable neural story
plot generation via reward shaping, in: Proceedings of
the Twenty-Eighth International Joint Conference on
Artificial Intelligence, IJCAI-2019, International Joint