Generalizing a Numeric Personality Metric for Narrative Planners Elinor Rubin-McGregor, Brent Harrison 1 Department of Computer Science University of Kentucky, Davis Marksbury Building, 329 Rose Street, Lexington, KY 40506-0633 USA Abstract In the field of narrative planning, there are many different approaches to personality modeling. So many that overarching study of personality models themselves is beginning to form. But a subject as complex as personality demands complex modeling, which in turn makes it difficult to compare implementations or to test sub-features of personality systems intended to be globalized. By generalizing an existing five-number model personality system, we hope to provide an adaptable resource that can be used for enhancement, comparison, or simply providing a foundational basis to other personality models. 1. Introduction Specifically, the Shirvani19 model uses metrics that de- mand an understanding of not only the current story plan, The consideration of personality is a major step forward but all or a large number of hypothetical alternative story in the field of narrative generation. Narrative generators plans. One such metric describes "creative thinking." This have a multitude of applications, from training models to metric is calculated by checking how many times the spe- video games and even organizational and strategic purposes. cific actions of a given character occur in a larger, preferably Incorporating personality into narrative models is a subject all-alternative-plan-encompassing, set of alternative stories. that has vexed many researchers for years, as personality is In addition, the paper uses the concept of "conflict" in its such a complex and varied concept. Yet it is critical, for if metrics for determining both agreeableness and intellect we do not model personality, our narratives cannot consider but defines its measurement of conflict as any time a char- ways in which behavior differs between different individuals. acter can observe any way in which their plans can fail. For narrative purposes alone, stories become more engaging This feature also requires knowledge that cannot easily be if the audience can identify with the characters and see them generated during story creation, as evaluating it requires as reflections of real people. Without personality considered, essentially finishing the story in multiple ways before the it is far more difficult to display narrative elements known story is even concluded. In short, there are features of the to entice audiences such as character depth. Two people in Shirvani19 model that can only be used to evaluate personal- the same situation will make different choices depending ity after several stories have already been generated, which on who they are, and attempting to capture that concept of in turn makes the model difficult to use if we want to apply "who they are" has been the pursuit of many. it during story generation. Currently, there are a wide variety of different unique To this end, we are proposing to modify the Shirvani19 personality models proposed for this purpose, with varying model such that it can be applied to a wider variety of nar- advantages and disadvantages. Many of these models, how- rative planners. We are also trying to simplify the overhead ever, require a great deal of effort to implement because they required to make the personality model work. Specifically, rely on information that is difficult for narrative planners to we propose to calculate a metric that describes "creative collect. For systems where personality is the central focus, thinking" by comparing the diversity of actions only along or where personality is an important element this may be an the specific plan, so that characters who utilize a broader acceptable cost to pay. But what about when the program range of actions are considered to have a higher Openness is not focused on developing a specific personality system, score than characters who repeatedly use the same actions. but instead on features related to multiple personality adap- Likewise conflict is redefined for both of its uses. Where it tation systems? Or perhaps, when personality is required is applied for measuring a character’s affability, we instead or beneficial but not the primary focus [1, 2]? What about check simply the number of ways a character’s actions could simply having a baseline personality model to compare a directly harm other characters. Where conflict is applied to more complex model to [3]? Having a small-scale easily intellect, we translate the chance of success to the chance implementable personality model would be beneficial for that other characters will oppose the actions of the given other researchers in this field. character. There is an existing personality model that does not re- In order to ensure that our proposed methods are usable, quire a great deal of effort to collect, running on data that we performed a user study wherein subjects evaluated the many narrative planners can easily collect already. This is stories produced by our modified model. In the end we the OCEAN-based personality model produced by Shirvani found that while our Agreeableness work seems to be very and Ware 2019. For easy of understanding and brevity, we applicable, our re-definition of Openness will need some refer to this model as Shirvani19. While this model does refinement in later work. utilize data that is generally available to narrative planners, it does have limitations associated with it. For example, this model is not entirely open to all domains and has some fea- 2. Related Work tures that cannot be calculated by a computer during story generation. There is a large amount of work on representing personality in digital narratives, even work that focuses on the Big Five OCEAN framework, but not many that are very modular AIIDE Workshop on Intelligent Narrative Technologies, November 18, 2024, [5, 6]. Shirvani came out with a follow-up to Shirvani19 University of Kentucky Lexington, KY, USA $ erru227@uky.edu (E. Rubin-McGregor); bha286@g.uky.edu that addressed the issues discussed here, but at the cost of (B. Harrison) increasing the size of the model [7]. © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). The well-known Versu drama manager is very good at CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings telling complex stories with consistent character personality, to complete tasks is considered to have low consci- but it requires a great deal of overhead work to run [8]. The entiousness. model needs files representing the world, social practices, • Extraversion is the degree to which a person wants and the characters. Not only that but it needs parser pro- to engage and interact with other people. Notably grams for all of these features, then initialization functions, a highly extraverted person can also be very mali- then a database to hold it all, and multiple levels of instantia- cious, as this category does not differentiate between tors before it can make a decision. Likewise, the Comme il positive or negative engagement with others, only Faut project handles complex emotional environments very frequency. well, but it requires a great deal of information provided to • Agreeableness is reflective of compassion and em- the data manager for any story domain to work[9]. Informa- pathy, and is used to measure how much a person tion on cultural knowledge, social facts, social states, social considers other people. Like Extraversion, someone exchanges and even more must be documented in order for can be very shy and have high agreeableness. them to be applied. • Neuroticism is a more internal emotional feature, as There are of course works that focus less on societal im- it describes essentially how nervous and insecure a pacts as a whole, and more on the individual characters. person is. Highly neurotic people will often struggle Bahamon and Young introduced other OCEAN-based sys- with self-esteem, and emotional instability is often tems, but they have not produced a way to directly evaluate linked to high levels of neuroticism. the OCEAN traits during runtime without extensive prepa- ration. Their earlier work in 2012 provides a way to remove The Shirvani19 model is primarily focused on scoring actions deemed out-of-character during story generation, the actions of a character according to how those actions but does not provide a mechanism to determine whether relate to these attributes. That is to say, it estimates what behavior is out-of-character or not. It is a model we would personality traits are being displayed in a given character’s like to use to test our own work on in the future [10]. Their actions, and to what degree each action displays those traits. later work further discusses evaluating personality consis- They do this by calculating twelve variables that are each tency in narrative models, but still does not introduce a used to contribute to a score describing a different OCEAN personality model to use [11]. attribute. A full table of these metrics and how they relate The drama manager from Why Are We Like This works to each OCEAN attribute are listed in Table 1. Of note, any well with the player’s actions and models character person- value with a (R) in it means that the value is used to reduce ality from player actions, but because of this it only works the overall score, as it defines a facet that makes an action for the specific high degree of player interaction used in fit less into the given personality attribute. the project [12]. It also uses abstract personality modeling, Of these scores, two are not as easy to formulate as others. rather than a personality system that can work as soon as it Agreeableness and Openness utilize metrics that are difficult is applied. There has also been work by Soares that models to obtain during story generation. We will discuss these the personality of the player for narrative decisions, but it metrics in greater detail below. does not model the characters of the narrative in the same way [5]. Shirvani and Ware developed a very impressive emotion-based personality model that solves many of the 3.1. Agreeableness same problems this paper seeks to correct [7]. This model As can be seen in Table 1, the Agreeableness OCEAN quality relies upon its emotional system heavily in modeling per- contains 4 metrics associated with it. One of these metrics, sonality, which in turn requires a larger amount of overhead (11 in Table 1), requires the planner to be able to calculate and thus isn’t as modular as this paper seeks to be. Its re- the number of conflicts created for other characters. The liance on its emotional system also prevents it from being Shirvani19 definition for conflict can be problematic for used with various other emotion-focused models [13, 14]. efficient calculation. Shirvani19 defines character conflict as occurring when a character can foresee any way their plan can go wrong and fail to reach their goal. This element 3. Background on the Shirvani19 is extremely difficult to evaluate in many systems, as it Model requires simulating all possible alternative actions or events that could happen, not simply the actions they intend to Shirvani and Ware proposed a personality model [4] for have happen. This would take a large amount of operational characters in a computational narrative that was based on time and resources to run, as well as require limits or ways the OCEAN model of personality, using Sabre as the basis to determine when to stop simulating additional possible of its planning model [15]. The OCEAN personality model, future plans. or "the big Five model," utilizes five key attributes to collec- tively describe personality: Openness, Conscientiousness, 3.2. Openness Extraversion, Agreeableness, and Neuroticism [16]. These are commonly accepted attributes of personality, and are The Openness attribute of OCEAN is defined by Shirvani defined as such: with two metrics. We refer to the first metric as “creative thinking” (referred to as the openness facet in Table 1) and • Openness means "openness to experience" and de- the second as intellect (1 and 2 in Table 1, respectively). scribes how much a person is willing to explore Creative thinking is a variance value, as it is used to reward outside of their comfort zone. This feature is also using a diverse set of actions. The equation for creative considered an aspect of curiosity, and therefore is thinking is as follows: often tied to creativity as well 𝑛 𝑂𝑐𝑐𝑢𝑟𝑒𝑛𝑐𝑒𝑠(𝑎𝑖 ,𝑝𝑗 ) Creative Thinking = 1 − 𝑚𝑖𝑛 ∑︀ • Conscientiousness is how organized and effective a 𝑖=1...𝑚𝑗=1 𝐿𝑒𝑛𝑔𝑡ℎ(𝑝𝑖 ) person is. Someone who acts carelessly or struggles OCEAN Quality Facet Description for character 𝑐𝑖 , and 0 if it does not. In short, the probability Openness Openness 1.The minimum of success is defined by how many other characters would action likelihood agree with the given character’s action plan. in a plan (R) Both the Creative Thinking metric and the Intellect met- Intellect 2.Probability of ric share some issues in how they are calculated. In both success of a plan cases, the set of 𝑛 plans [𝑝1 ...𝑝𝑛 ] demands that the program Conscientiousness Industriousness 3.# of actions in a and Orderliness plan (R) collect a large collection of potential actions for every single 4.# of times the character’s potential plans. This works on the assumption agent changes that the implementation of personality is done after the their mind (R) planner has generated multiple plans, and assumes that per- 5.# of actions sonality is simply used to collect the best possible plan. Such with self as a system is not feasible if the planner is intended to be used the consenting for real-time story generation, or if the planner is working character with a human agent. It demands not only a large portion of Extraversion Enthusiasm 6.# of actions work be completed multiple times for every character on including others every step, it also needs to have all or a large set number of with their con- sent solutions generated for the metric to be collected. Assertiveness 7.# of actions The Intellect metric is also problematic in that the proba- including others bility of success calculation relies on being able to calculate without their whether an action would generate a conflict with another consent character. We have already discussed the potential issues Agreeableness Compassion 8.# of actions with calculating conflict information in the previous section. including others with their con- sent 4. Methods 9.# of goals achieved for To make the personality model more flexible, we replaced other characters the problematic aspects of the Openness and Agreeableness Politeness 10.# of actions OCEAN metrics with values that could be collected more including others easily. For Agreeableness we only needed to re-evaluate the without their concept of conflict, but for Openness we propose alternative consent (R) calculations for both Creative Thinking and Intellect. We 11.# of conflicts will discuss each of these in greater detail below. created for other characters (R) Neuroticism Withdrawal and 12.# of times the 4.1. Conflict Volatility agent changes their mind In the Shirvani19 model, conflicts are calculated by deter- mining any point at which their plan could fail. While Table 1 this is a rigorous way to determine conflict, we propose Shirvani19’s Metrics of Personality for the OCEAN personality a metric that relaxes the idea of conflict in the interest of model [4]. making it easier to calculate. Instead of calculating conflict so directly, we propose defining character conflict by the character’s goals or other motivating factors. Rather than In this function, we assume the agent is considering 𝑛 simulating an entire world change for potential issues, we possible different plans to take. The set of these plans is argue that simply checking two states for comparison is [𝑝1 ...𝑝𝑛 ], so 𝑝𝑖 is the i-th plan being considered. The value 𝑎𝑖 enough. Specifically, our metric compares one existing state is a given action in one or more of these plans. Thus we can and one hypothetical state. The "true" state, 𝑡0 is the state at think of the plans as sets of actions, 𝑝𝑖 = [𝑎1 ...𝑎𝑚 ]. The value the moment when the character is considering a plan, before 𝑚 is the total number of actions that are possible for the taking or deciding on an action, and is thus "true" because character to take. As for the larger values, Occurences(𝑎𝑖 , it has come to pass outside of the character’s plans. The 𝑝𝑗 ) is used as the number of times action 𝑎𝑖 occurs in plan hypothetical state is the predicted end state that will come 𝑝𝑗 , while Length(𝑝𝑖 ) is the number of steps in plan 𝑝𝑖 . to pass if the character’s entire plan is executed without The second metric that contributes to Openness is Intel- fail, 𝑡𝑛 . In this we consider 𝑡1 to be the first action in the lect. The Shirvani19 model defines this metric as the proba- plan the character is considering, with the considered plan bility that a plan succeeds. The probability of success of a having a total of 𝑛 steps in it. plan is defined as the likelihood of a plan succeeding based Thus, our changed definition of 𝐶𝑜𝑛𝑓 𝑙𝑖𝑐𝑡𝑎𝑗 (𝑐𝑖 ) will need on the number of conflicts created with other characters. to specify that 𝑎𝑗 would result in the world state 𝑡𝑗 if exe- This was defined as such: cuted. With this, 𝐶𝑜𝑛𝑓 𝑙𝑖𝑐𝑡𝑎𝑗 (𝑐𝑖 ) is 1 if character 𝑐𝑖 has a 𝑛 ∑︀ 𝑚 𝐶𝑜𝑛𝑓 𝑙𝑖𝑐𝑡 (𝑐 ) 𝑎𝑗 𝑖 Probability of Success = 1 − higher goal metric at 𝑡0 than at 𝑡𝑗 , and is 0 otherwise. In ∑︀ 𝑛·𝑚 𝑖=1𝑗=1 other words, as long as action 𝑎𝑗 moves the character, 𝑐𝑖 In this, the values of 𝑛 and 𝑚 represent the total number further from its goal, then 𝐶𝑜𝑛𝑓 𝑙𝑖𝑐𝑡𝑎𝑗 (𝑐𝑖 ) will evaluate to of characters and the total number of possible actions re- 1. spectively. The value 𝑐𝑖 represents character i out of the set of all characters in the domain, and 𝑎𝑗 represents a given ac- tion 𝑎𝑗 in the set of all 𝑚 potential actions. 𝐶𝑜𝑛𝑓 𝑙𝑖𝑐𝑡𝑎𝑗 (𝑐𝑖 ) is therefore a value that is 1 if action 𝑎𝑗 causes a a conflict 4.2. Creative Thinking set up as a between-subjects experiment where participants were randomly sorted into one of these four groups. Recall that to calculate Creative Thinking, the Shirvani19 model needs to calculate the variance associated with a plan by calculating the minimum action likelihood in a plan 5.1. Story Domain and Story Generation across many different plans being considered. To make Shirvani and Ware unfortunately did not keep track of their creative thinking easier to calculate, we propose to simply original program, thus we were unable to use the exact same examine the diversity of the actions considered in the plan. domain as they did. For our story experiments, we emulated We define 𝑝𝑙𝑎𝑛𝑛𝑒𝑑𝑆𝑒𝑡 as the combined set of all actions the domain used in the 2019 work as closely as possible. that have occurred up to the point at which an action is Thus, we generated stories about a boy named Tom, whose being considered combined with the set of actions in the goal is to gain some herbs for his sick grandmother. The most likely future plan. We then define 𝑎𝑐𝑡𝑖𝑜𝑛𝐶𝑜𝑢𝑛𝑡𝑒𝑟 as herbs are in the possession of a Merchant, whose goal is to a set of size 𝑚, with 𝑚 being the total number of possible gain a coin- which Tom happens to have. The Merchant is actions in the domain. The set 𝑎𝑐𝑡𝑖𝑜𝑛𝐶𝑜𝑢𝑛𝑡𝑒𝑟 holds the in the Town, while Tom is in the Forest. But there is also a number of times every given action in the domain is exe- Bandit in the Forest, who also wants the coin. Any character cuted throughout the entirety of the 𝑝𝑙𝑎𝑛𝑛𝑒𝑑𝑆𝑒𝑡, and thus can walk from one location to another, any character can buy can be calculated by going over the 𝑝𝑙𝑎𝑛𝑛𝑒𝑑𝑆𝑒𝑡 just once. an item from another by spending a coin, and any character In other words, 𝑎𝑐𝑡𝑖𝑜𝑛𝐶𝑜𝑢𝑛𝑡𝑒𝑟 is the eventual count of that holds a weapon can rob another character for any item how many times every potential action would occur if the in their inventory. A character with a weapon can also kill given plan occurs without any interference or changes. We another character, and any living character can loot the calculated a variance-based metric that scales from 0 to 1. corpse of a dead character for any items they hold. There is Using an existing commonly used variance algorithm, we also a bandit camp, where there is a chest with a secondary applied a variance-based metric. In other words, we mea- coin. Finally, there is a guard in Town, who has the unique sured openness to new experiences as the variance between action to arrest the bandit, and whose goal is to arrest the different kinds of action the character showed. bandit. The three characters that hold weapons at the start of the story are the bandit, the merchant, and the guard. 4.3. Intellect For each of the treatments mentioned above we needed to generate a total of four stories. One of these stories was the Originally, Intellect was calculated as the probability of “true” story. In these stories, Tom would take actions that success in a plan in terms of the number of conflicts that either ranked very highly in the Agreeableness or Openness it could create. We decided the easiest way to simplify the metrics as described above, or very lowly in those metrics, problem is to cut out a unique metric entirely, and instead depending on the category. use the “politeness" metric (10 in Table 1) for two values Another story involved Tom displaying the opposite per- and two purposes. sonality to the one being tested. Thus, if the treatment Politeness is a metric that is calculated by determining group was associated with Low Agreeableness, then this the number of actions that include other characters without story would involve Tom performing High Agreeableness them consenting to the action. The first way we utilize the actions. This story is meant to be a bad fit for the character’s Politeness metric is in its originally intended way. That is "true" personality. to say, it is used to help calculate the Agreeableness of an The final two stories consisted of a story that had a agent where the smaller the number of actions taken that medium score in the given attribute being tested accord- include nonconsenting characters, the larger the Politeness ing to our metrics (and, thus, did not display Tom as strong metric. exhibiting or not exhibiting the attribute in question) and a The new way we propose to use the Politeness metric is to story chosen at random. apply it to the concept of "Opposing Forces" in the sense that it estimates how much opposition the character would need to overcome to ensure the plan operates smoothly. In other 5.2. Experimental Methodology words, we consider the plan more likely to succeed- and We had two hypotheses that we wanted to evaluate for our therefore more intelligent- based on the number of agents experiments. The first was that when shown a story that that would oppose the plan by not consenting to take part our model generated and claimed displays one of our target in certain actions. personality traits in the main character, the audience will identify the main character as someone who holds these 5. Experiments traits. The second hypothesis is that when shown a set of stories that includes one tale that our model claimed also To evaluate the quality of our proposed metrics, we ran a displays the same target trait in a similar quantity in the human subjects experiment. We attempted to run our exper- main character, the audience will identify that particular iments as closely as possible to the experiments performed story as the one they consider most realistic. in [4], thus our experiments were focused on whether or not For these experiments, subjects were first given a brief de- an audience reading our generated stories observed the in- scription of the domain, introducing the people, the places, tended personality traits assigned to a given character. Since and the goals of the characters. They were then shown the our metrics only affect 2 of the OCEAN traits, we limit our four generated stories described above and told that all were experiments to the following four basic treatments involv- possible ways that the story could proceed. One of these ing the OCEAN traits Openness and Agreeableness: High stories would be the "true" story wherein the target charac- Openness (HO), Low Openness (LO), High Agreeableness ter’s behavior closely matched the personality the subject’s (HA), and Low Agreeableness (LA). The experiment was category was testing for. The domains for these stories were OCEAN Quality Question OCEAN Quality Question Agreeableness Tom avoids conflict. Openness Tom finds creative solutions to Agreeableness (R) Tom takes advantage of others. problems. Agreeableness (R) Tom is out for his own personal Openness Tom tends to analyze possible out- gain, with his grandmother as the comes of his plans. only exception. Openness (R) Tom has difficulty coming up with Agreeableness Tom likes to do things for others excellent plans. as well as his grandmother. Openness Tom has excellent ideas. Agreeableness (R) Tom can’t be bothered with other’s Openness (R) Tom’s ideas are ordinary and needs (unless they are his grand- hardly unique. mother). Extraversion Tom finds it difficult to approach Extraversion Tom feels comfortable around peo- others. ple. Conscientiousness Tom gets things done quickly. Neuroticism Tom does things that he later re- Neuroticism Tom changes his mood a lot. grets. Conscientiousness Tom makes plans and sticks to Table 3 them. Statements Evaluated for testing Openness Table 2 Statements Evaluated for testing Agreeableness of stories were generated from two domains that were al- most entirely identical to the original domain. These two domains had exactly one additional feature each; one added identical. In total, there were eight stories shown to sub- the location of a Bandit’s Camp where a coin could be found, jects testing Openness and eight different stories shown to and the other included an additional Guard character whose subjects testing Agreeableness. goal was to arrest the Bandit. The subjects were told that After reading this, they were then told that one of the these additional four stories included the bandit camp/the stories was the "true" story and were then asked to rate guard, so they would still understand the limitations of the statements about the target character using a 5-point Likert world. No other changes were made to the domains. Similar scale. We tried to use the same statements as Shirvani and to the stories shown previously, one of these four stories Ware, however preliminary testing showed that the framing shown to the audience was ranked by the model as portray- device for the story was interfering with the results. To be ing a personality close to the "true" personality of the target specific, the domain in which the story takes place features character, one was ranked lowly, one was ranked medium, the target character of Tom, trying to get herbs for his sick and the final one was a randomly chosen story. Subjects grandmother. We attempted to mention in the explanation were then asked which story they thought most closely fit that Tom’s grandmother provides for him, thus implying a the given character’s personality. potential selfish motive for Tom’s behavior, but the majority of results in our initial testing showed high Agreeableness regardless of Tom’s behavior in the story. Thus, we modified 6. Results the statements slightly so that the participants would give answers based only on the parts of the story that our model In this section we will review the results of our experiments had generated rather than the backstory. The statements on identifying the protagonist personality traits and choos- we used are shown in Table 2. It should be noted that there ing stories that align with the protagonist’s personality type. is no "grandmother" character included in any story domain For these experiments, we collected results for 176 subjects used, as the character is a plot device and cannot take any using Prolific, with each subject randomly assigned to one of actions during the story. the four treatment categories. The category with the small- While most statements presented to the user were re- est number of subjects was Low Openness, which had 35 lated to the specific OCEAN category we were testing, we subjects. The category with the highest number of subjects included a few statements related to different OCEAN qual- was High Openness, with 48 subjects. ities as well. This was done to avoid having the subjects fixate too heavily on the general theme behind the questions 6.1. Identifying Protagonist Personality and to encourage them to think about the entire story in Traits their responses. These statements were not used for ana- lyzing the target metric of the category. The statements Recall that our first hypothesis was that participants should tested for both HO and LO were the same, as were the state- be able to identify if Tom exhibits either high or low Agree- ments tested for HA and LA. Table 2 contains the statements ableness or high or low Openness depending on the treat- presented to the subjects for Agreeableness tests, and Ta- ment group. To do this, we evaluated each user’s responses ble 3 contains the statements presented to the subjects for to the statements related to their treatment group. For each Openness tests. It should be noted that some questions were statement related to the aspect of personality we were an- meant to reflect a low score in the given metric, not a high alyzing, we aligned the statements with High-attribute, or one. Ones marked with an (R) for "Reverse" were expected Low-attribute implications, ie "Reverse" implications to the to be agreed with if the "true" story ranked the character as statement. For the subjects that fell into a high-categories, having a low value in the given metric. we considered it a success if the subjects ranked non-Reverse After rating these statements, subjects were then shown statements with "Strongly Agree" or "Agree," and Reverse four more stories and asked which one they thought most statements with "Disagree" or "Strongly Disagree." Likewise, likely to occur based on the target character’s personality. for the low-categories success was determined if the subjects In order to increase variability in stories, the second set ranked non-Reverse statements as "Disagree" or "Strongly 5-Pt Likert Scale Story Selection 5-Pt Likert Scale Story Selection p-value Effect p-value Effect p-value Effect p-value Effect Size Size Size Size HO 0.869 0.367 4.412e-08 0.625 O 0.072 1.160 0.026 1.60 LO 0.998 0.303 0.999 0.057 C 0.016 1.160 0.001 1.73 HA 3.941e-13 0.634 2.584e-3 0.447 E 0.024 1.167 0.014 1.61 LA 5.218e-41 0.830 1.752e-09 0.674 A 0.048 1.167 <0.001 2.80 N 0.063 1.128 0.002 2.04 Table 4 Experiment Results Individually Table 6 Shirvani’s Results 5-Pt Likert Scale Story Selection p-value Effect p-value Effect Size Size projects, or as a personality framework with which to test O 0.988 0.347 4.395e3 0.385 personality-adjacent features. Checking that a supposedly A 1.818e-47 0.731 2.138e-10 0.559 multi-personality feature relating to say memory or charac- ter beliefs actually works with multiple personality systems Table 5 requires having access to other systems of personality to Experiment Results Combined use. While our work has managed to adapt the Agreeableness metrics to an acceptable degree, we had much less success Disagree" and ranked Reverse statements as "Agree" or with Openness. One possible cause is that calculating intel- "Strongly Agree." lect by calculating the opposition to the character’s plans To determine if there was a significant effect, we used weighs too closely to Agreeableness. It is also possible that a binomial exact test, testing the distribution of observed our variance metric for openness punishes plans where the successes and failures against a null hypothesis of users character happens to take the same kind of action regard- providing random responses to each statement. The results less of whether the action is the smartest thing to do. It’s of this analysis are summarized in Tables 4 and 6 under the also possible that in stories where one character takes few heading “5-Pt Likert Scale.” Table 4 contains information on actions compared to other agents, the variance score for each individual treatment, and Table 6 contains results if openness sees this as showing more variety in the charac- treatments were aggregated based on either Agreeableness ter’s actions simply because the character may not have or Openness. repeated the same type of action, even if this story shows The binomial tests indicate that people are able to cor- the character as non-proactive. Alternative approaches to rectly identify when the protagonist of the story exhibits Openness might find more luck in the future, or alterna- high agreeableness (p = 3.941e-13) and low agreeableness tive data-collecting information might enable calculating (p=5.218e-41). We did not observe overwhelming evidence Shirvani19’s openness metric without issue. that participants could identify when Tom exhibited either It should also be noted that unlike the original experiment high or low openness. When taken in aggregate, however, set, in the case of Agreeableness our story selection did we did find significant differences between how users would much worse than our Likert scale tests, which is the opposite respond in both the Openness and Agreeableness categories. of what Shirvani and Ware found. One explanation could These results mostly agree with the results obtained by Shir- be that we didn’t account for any personality metrics apart vani and Ware. from the target score, and thus there were other factors that the audience considered more pertinent than we did. Our 6.2. Choosing Stories According to attempts to make them consider the story from multiple Personality Type perspectives may have increased the effect if that is the case. The second hypothesis we test is the idea that when sub- If we were to test this hypothesis in the future, we would jects choose a story that they feel best fits the character’s need to expand our generation of alternative stories to check personality, they will choose the one that our model claims all personality values, not just the targeted ones, and select is closest to the original "true" story in personality. our "nearest-fit" stories to be ones where the model claims As before, we use a binomial exact test to analyze whether the target character shows a moderate personality in all participants select the correct story more frequently than aspects except for the aspect being tested. the null hypothesis of random story selection. Binomial Another possible cause is that the subjects might have tests on our story selection experiments indicate that partic- selected most likely stories based on the actions of other ipants are able to identify stories where Tom exhibits high characters outside of our target character, instead of focus- openness (p=4.412e-08), high agreeableness (p=2.584e-3), ing on Tom’s behavior alone. Although the original four set and low agreeableness (p=1.752e-09). of stories shown portray the various characters apart from Tom acting in various different ways, some readers might have still attributed their personalities in their selection of 7. Discussion most-likely stories. This could be corrected in future studies by simply replacing the names of these characters in the By simplifying and reconstructing the metrics used in the second set of stories, so that the audience views them as Shirvani19 model, we could provide a personality imple- different. Alternatively, another story domain with a single mentation framework that is easily applied to a wide variety character present could be used for future testing. of projects. Having a small-scale framework for personal- ity in narrative planning could be used to enhance other 8. Conclusion Conferences on Artificial Intelligence Organization, 2019. URL: http://dx.doi.org/10.24963/ijcai.2019/829. In our attempts to refine the 2019 OCEAN-based personality doi:10.24963/ijcai.2019/829. model into a format that can be applied to story generation [7] A. Shirvani, S. G. Ware, L. J. Baker, Personality and tasks as well as story evaluation tasks while still remaining a emotion in strong-story narrative planning, IEEE small-scale easily implemented personality model, we have Transactions on Games 15 (2023) 669–682. doi:10. had some successes and some failures. Our results indicate 1109/TG.2022.3227220. that of the two OCEAN attribute metrics we sought to refine, [8] R. Evans, E. Short, Versu—a simulationist storytelling only Agreeableness has been properly adjusted into a format system, IEEE Transactions on Computational Intelli- that audiences will recognize. Our work on Openness needs gence and AI in Games 6 (2014) 113–130. to be redefined, and one of the testing metrics we have used [9] e. a. McCoy, Joshua, Social story worlds with comme should likely be refined as well before using it again. il faut. (2014) 97–112. One problem is that we were focused too intensely on [10] J. C. Bahamón, R. M. Young, A choice-based model of translating the original metrics into runtime-calculable character personality in narrative, in: Workshop on forms, and as such did not reevaluate if alternative solutions Computational Models of Narrative, 2012, pp. 164–168. might work better. For a first attempt this is still a crucial [11] J. C. Bahamón, R. M. Young, An empirical evaluation step to reach, but there are still clear problems. For example, of a generative method for the expression of person- take the original concept of evaluating "intelligence" by way ality traits through action choice, in: Proceedings of of evaluating the likeliness of other characters opposing the the Thirteenth AAAI Conference on Artificial Intelli- plan as a probability of success. Our adaptation was simply gence and Interactive Digital Entertainment, AIIDE’17, using another measurement of the number of characters AAAI Press, 2017. likely to oppose the plan, but this results in punishing cases [12] M. Kreminski, M. Dickinson, M. Mateas, N. Wardrip- where intelligent characters can act coldly, or manipulative Fruin, Why are we like this?: Exploring writing me- of others. chanics for an ai-augmented storytelling game, in: Still, our work has helped progress towards a personal- International Conference on the Foundations of Digi- ity model that may not be the most refined nor even the tal Games (FDG ’20), 2020. most accurate, but could be applied easily and quickly to [13] H. Rashkin, A. Bosselut, M. Sap, K. Knight, Y. Choi, any given narrative planner for character enhancement or Modeling naive psychology of characters in simple comparative study with other personality models. Testing a commonsense stories, 2018. URL: https://arxiv.org/abs/ baseline model for comparison is a practice seen in count- 1805.06533. arXiv:1805.06533. less scientific fields, and providing a model that can serve [14] D. Sander, Models of emotion: the affective neuro- as one for personality modeling would benefit many future science approach, The Cambridge Handbook of Hu- researchers. man Affective Neuroscience (2013) 5–53. [15] S. G. Ware, C. Siler, Sabre: A narrative planner sup- porting intention and deep theory of mind, in: AAAI Conference on Artificial Intelligence and Interactive References Digital Entertainment, volume 17, 2021, pp. 99–106. [16] C. Deyoung, L. Quilty, J. Peterson, Between facets [1] P. Gervás, B. Lönneker-Rodman, J. C. Meister, and domains: 10 aspects of the big five, Journal of F. Peinado, Narrative models : Narratology personality and social psychology 93 (2007) 880–96. meets artificial intelligence, 2006. URL: https://api. doi:10.1037/0022-3514.93.5.880. semanticscholar.org/CorpusID:89613631. [2] S. Imabuchi, T. Ogata, Story generation system based on propp theory as a mechanism in narrative gen- eration system, in: 2012 IEEE Fourth International Conference On Digital Game And Intelligent Toy En- hanced Learning, 2012, pp. 165–167. doi:10.1109/ DIGITEL.2012.47. [3] F. Peinado, P. Gervás, Creativity issues in plot genera- tion (2005). [4] A. Shirvani, S. G. Ware, A plan-based personality model for story characters, in: AAAI Conference on Artificial Intelligence and Interactive Digital Entertain- ment, 2019, pp. 188–194. [5] E. S. de Lima, B. Feijó, A. L. Furtado, Adaptive sto- rytelling based on personality and preference mod- eling, Entertainment Computing 34 (2020) 100342. URL: https://www.sciencedirect.com/science/article/ pii/S187595211930076X. doi:https://doi.org/10. 1016/j.entcom.2020.100342. [6] P. Tambwekar, M. Dhuliawala, L. J. Martin, A. Mehta, B. Harrison, M. O. Riedl, Controllable neural story plot generation via reward shaping, in: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-2019, International Joint