Tagging Narrative with Propp’s Character Functions Using Large Language Models Pablo Gervás1,2 , Gonzalo Méndez1,2 Facultad de Informática, Universidad Complutense de Madrid, Madrid, 28040 Spain Instituto de Tecnología del Conocimiento, Universidad Complutense de Madrid, Madrid, 28223 Spain Abstract The character functions proposed by Vladimir Propp as abstraction of plot structure to understand Russian folk tales have been popular as means of analysing narrative. Several efforts have been carried out to annotate narratives in this way manually or to apply machine learning techniques over texts previously annotated with syntactic and semantic information. The present paper explores the feasibility of annotating directly from the bare text of synopses of the stories by relying on large language models. Keywords automated tagging, Propp’s Morphology of the Folktale, character functions, large language models 1. Introduction When trying to understand narrative, a very powerful tool used in the past has been the identification of abstractions of the meaning of the story that describe its plot in a way that is more generic than a summary of the story. A major tool to help in that aim was the Morphology of the Folktale proposed by Russian formalist Vladimir Propp [1]. The set of character functions proposed by Propp to describe the structure of Russian folktales has been subsequently applied to analyse French fairy tales [2], fantasy plays [3] and television science fiction [4]. Research efforts have focused on developing detailed annotation schemes [5, 6] or ontologies [7, 8, 9] that can be used to annotate stories manually. More recent efforts consider the possibility of developing automatic processes of annotation at the level of Proppian functions that operate over a prior layer of syntactic and semantic annotation of the stories [10, 11, 12, 13, 14]. The present paper explores the possibility of relying on the few-shot learning capabilities of large language models to annotate stories with Proppian functions by operating directly over the text of synopses of the stories. In: R. Campos, A. Jorge, A. Jatowt, S. Bhatia, M. Litvak (eds.): Proceedings of the Text2Story’24 Workshop, Glasgow (United Kingdom), 24-March-2024. ⇤ Corresponding author. � pgervas@ucm.es (P. Gervás); gmendez@ucm.es (G. Méndez) � http://nil.fdi.ucm.es (P. Gervás); http://nil.fdi.ucm.es (G. Méndez) � 0000-0003-4906-9837 (P. Gervás); 0000-0001-7659-1482 (G. Méndez) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings 137 Table 1 Character functions proposed by Propp. Absentation The villain or a member of the Branding The hero is marked or identi- family leaves the home fied Interdiction A prohibition is imposed on Victory The hero defeats the villain. the hero Interdiction violation The interdiction is violated Liquidation of lack Villainy or lack resolved Reconnaissance The villain attempts to obtain Return The hero returns information Delivery The villain gains information Pursuit The villain chases the hero or an object Trickery The villain deceives the hero Rescue from pursuit The hero is saved from pur- to gain an advantage suit or danger Complicity The victim submits to decep- Unrecognized arrival The hero arrives unrecog- tion. nized Villainy or lack The villain causes harm or in- Unfounded claims A false hero claims hero sta- jury or someone lacks some- tus thing they want Mediation Misfortune is made known Difficult task The hero is given a difficult task Counteraction The hero reacts to the vil- Solution The task is accomplished or lain’s actions resolved Departure The hero leaves home Recognition The hero is recognized or ac- knowledged 1st donor function The hero is tested by a poten- Exposure The villain’s identity or de- tial donor ception is exposed Hero’s reaction The hero reacts to the test Transfiguration The hero undergoes a trans- formation. Receipt of agent The hero acquires a magical Punishment The villain receives punish- agent ment or consequences. Transfer The hero is hero directed or Wedding The hero marries or is re- taken to new location warded Struggle The hero and villain engage in direct conflict. 2. Previous Work The work relevant to this paper that needs reviewing is the representation of narrative using Propp’s character functions, existing efforts to annotate narrative with them, and solutions for annotating text with large language models. 2.1. Narrative Structure and Propp’s Character Functions Based on a detailed study of 100 Russian folk tales from the anthology by Afanasiev, the Russian formalist Vladimir Propp proposed a set of abstractions of plot-relevant functions played by the characters of a tale, which he called character functions [1]. These character functions represent basic contributions to the plot such as going on a journey, having a conflict with another character, being sent on a mission, or being rewarded. The character functions proposed by Propp are shown in Table 1. Certain character functions are associated with particular types of character, such as the hero or the villain. Propp also postulated seven such types of character: the villain, the donor (provider), the helper, the princess (or sought-for person), the dispatcher, the hero or victim and the false hero. 138 Early uptake of Propp’s character functions as means of annotating a corpus of stories relied on XML [5]. Propp’s character functions have been re-represented as ontologies defined in Description Logics [7]. The more relevant effort at annotating with character function information the very stories for which Propp designed his character functions was carried out by Finlayson as part of his PhD thesis [12], and later extended by Yarlott and Finlayson [6]. They propose ProppML, “an annotation scheme designed to capture all the components of a Proppian-style morphological analysis of narratives”. Using this scheme, they annotated fifteen Russian folk tales from the original corpus, which amounts to 18,862 words. The corpus was annotated separately by two highly trained annotators, and the results then merged with supervision by a third expert annotator. 2.2. Automated Annotation of Propp Character Functions Finlayson’s efforts at deeply annotating a corpus of Russian folk tales at various levels–syntax, semantics, and narrative structure– made it possible to develop a machine-learning algorithm capable of extracting culturally-relevant plot patterns from sets of folktales. Such an algorithm takes as input a story annotated as an event timeline and it can predict for it abstract nar- rative structures such as Villainy, Struggle-Victory, and Reward [14]. This an instance of a computational system learning a theory of narrative structure directly from a corpus of data. 2.3. Text Annotation with Large Language Models The recent advances demonstrated by applying Large Language Models (LLMs) to various Natural Language Processing (NLP) tasks [15] has opened for exploration a number of avenues that appeared to be impossible just a few years ago. The main idea is that a model pre-trained on very large amounts of textual data can be queried in very simple ways to carry out specific NLP tasks and perform surprisingly well either with no additional information (zero-shot learning) or with very simple queues that exemply how the task should be solved inserted in the prompt that poses the query (few-shot learning). In particular, the task of annotating texts shows that tools such as ChatGPT show very high performance [16]. ChatGPT is an AI language model developed as part of the GPT (Generative Pre-trained Transformer) series of models and specifically designed for generating human-like text responses in conversational contexts. This type of approach has been applied to diverse tasks such as annotating legal text [17] or propagandistic text [18]. Gemini [19] (previously Bard [20]) is a family of generative AI models that lets developers generate content handling both text and images as input. These models can generate text, handle multi-turn conversations and also zero, one, and few-shot tasks. These models have been trained with data existing up to the beginning of 2023. The Gemini Pro 1.0 version is available for use through an open web interface or through a commercial web API. It allows for similar queries as those presented to ChatGPT [21]. 139 3. Automated Tagging of Narrative Synopses with Proppian Character Functions We want to explore the relative performance of large language models when used to annotate stories with Propp’s character functions via zero-shot learning and few-shot learning approaches. 3.1. Zero-Shot Tagging Using ChatGPT We are going to try to get ChatGPT to annotate the synopses of the tales that Propp includes as examples of annotations in his book. This should allow us to compare the annotation proposed by ChatGPT with the one indicated by Propp himself. As a first step, we ask ChatGPT if it knows about Propp’s functions. This is the reply: Vladimir Propp, a Russian folklorist and scholar, identified and analyzed common narrative elements in Russian folktales. One of his most notable contributions is his identification of “morphology of the folktale,” which includes what he termed “character functions”. Propp identified 31 character functions that frequently appear in folktales. Here’s a brief overview of Propp’s character functions: It then provides a list with 31 items, containing 7 character types and 24 character functions, some of which are Propp’s and others are not; there are also some missing Propp’s character functions: • Character types: hero, villain, donor, helper, princess/prize, false hero, dispatcher • Propp’s character functions: interdiction, violation, reconnaissance, delivery, trickery, complicity, mediation, counteraction, departure, first function of the donor, hero’s re- action, receipt of a magical agent, struggle, branding, victory, return, pursuit, rescue, unrecognized arrival • Other functions: helper’s object, magical agent, villain’s lack, guidance, liquidation (equivalent to victory) • Missing Propp’s functions: absentation, villainy or lack, transfer, resolution of trigger, unfounded claims, difficult task, solution, recognition, exposure, transfiguration, punish- ment, wedding It is interesting to note that ChatGPT, although knowledgeable in general in a surprising way, mixes up the types of character identified by Propp with the character functions, gets the number right, but misses out a number of functions because it includes the types of characters instead of those functions. To test the ability of the system to annotate a story with the characters functions, the system is given the following prompt: Can you annotate this tale using Propp’s character functions? In response, the system outputs a text that describes its version of the annotation. For the following example (tale 104 as described in Propp’s book): 140 A girl called Vasilisa has a miraculous doll. The tsar does not have a wife. Vasilisa goes to town. Vasilisa settles in an old woman’s house. The old woman buys Vasilisa some flax. Vasilisa spins unusually fine yarn from the flax. The doll makes a loom for Vasilisa in one night . Vasilisa weaves unusually fine linen on the loom. The old woman takes the linen to the tsar. The tsar orders that shirts be sewn by the one who has spun and woven the linen. The girl sews the shirts from the linen. The tsar sends for Vasilisa. Vasilisa marries the tsar. The assignment of character functions that Propp suggests is: A girl called Vasilisa has a miraculous doll (Acquisition of Magical Agent). The tsar does not have a wife (Lack). Vasilisa goes to town. Vasilisa settles in an old woman’s house (Unrecognised Arrival). The old woman buys Vasilisa some flax. Vasilisa spins unusually fine yarn from the flax. The doll makes a loom for Vasilisa in one night (Acquisition of Magical Agent). Vasilisa weaves unusually fine linen on the loom. The old woman takes the linen to the tsar. The tsar orders that shirts be sewn by the one who has spun and woven the linen (Difficult Task). The girl sews the shirts from the linen (Task Resolved). The tsar sends for Vasilisa. Vasilisa marries the tsar (Wedding). In contrast, the annotation that ChatGPT suggests as a zero-shot solution is: Lack: The tsar lacks a wife. Dispatcher: Vasilisa goes to town. Helper: Vasilisa settles in the old woman’s house and receives aid from the miraculous doll. Task: The old woman buys flax for Vasilisa, who spins unusually fine yarn and weaves linen. Delivery: The old woman takes the linen to the tsar. Task: The tsar orders shirts to be sewn from the linen spun and woven by the same person. Performance of task: Vasilisa sews the shirts. Arrival: The tsar sends for Vasilisa. Final union: Vasilisa marries the tsar. Zero-shot learning presents the difficulty of having the system provide the result in a format of its own devising, which may differ significantly from the one being used as reference. 3.2. Zero-Shot Tagging Using Gemini Similarly to what we did using ChatGPT, we prompted Gemini for what it knew about Propp’s character types and character functions. It provided a list of the seven character types and then another with 31 character functions:1 initial situation, absentation, interdiction, violation, reconnaissance, delivery of information, trickery, complicity, mediation, beginning of the counteraction, hero’s departure, first function of the donor, tests of the hero, acquisition of magic agent, hero’s reaction, spatial translocation, struggle, branding, victory, liquidation of the lack, return, pursuit, rescue, unrecognized arrival, false hero’s claims. recognition, reward, difficult task, exposure of the false hero, transfiguration, wedding. punishment, and resolution. This response does in essence match Propp’s set of character functions, but it presents an issue that adds complexity to the task. The system is sometimes not using the exact names that 1 Gemini listed the character functions grouped by sphere of action, following Propp’s description of them. 141 Table 2 Mapping between Propp’s character function names and Bard’s paraphrases. Propp’s name Bard’s paraphrase Delivery Delivery of information Transfer Spatial translocation Exposure Exposure of false hero Solution Resolution Propp assigned to his character functions, but rather paraphrasing some of them. This happens for instance for the ones shown in Table 2. Zero-shot learning approaches to this task will likely suffer from having to identify an appropriate mapping between the set of tags used by the system and the set of tags being used as reference. This presents a particular problem because it makes it very difficult to carry out automated processing of the tagging. To test the ability of the system to annotate a story with the characters functions, the system was given the same prompt as in the previous case. For the same example considered above (tale 104 as described in Propp’s book) the annotation that Gemini suggests as a zero-shot solution is: Characters: Hero: Vasilisa Dispatcher: None explicitly identified (opportunity arises naturally) Donor: Doll (magical helper) Prize/Princess: Tsar (implicitly) False Hero: None explicitly identified Narrative Functions: Sphere of Action 1: Preparation Initial Situation: Vasilisa has a miraculous doll and lives without a father figure. Lack: Tsar lacks a wife. (Note: not directly affecting Vasilisa, but creates initial context) Sphere of Action 2: Departure Beginning of the Counteraction: Vasilisa goes to town (implicitly seeking opportunity). Sphere of Action 3: The Hero’s Actions Hero’s Departure: Vasilisa leaves her home environment. First Function of the Donor: Old woman provides flax (indirect aid). Tests of the Hero: Task of spinning fine yarn (challenge). Acquisition of Magic Agent: Doll magically creates a loom for Vasilisa. Hero’s Reaction: Uses the loom to weave fine linen. Spatial Translocation: Linen reaches the Tsar. Sphere of Action 4: Return Return: Tsar acknowledges the linen and initiates contact with Vasilisa. Return: Vasilisa arrives at the Tsar’s court. Sphere of Action 5: Recognition and Reward Unrecognized Arrival: Vasilisa implicitly demonstrates her skills. Difficult Task: Sewing the shirts (implicit test of skill). Exposure of the False Hero: N/A Transfiguration: Vasilisa gains new status as Tsar’s wife. Wedding: Marriage to the Tsar (reward). Sphere of Action 6: Resolution Resolution: Lack of the Tsar is addressed (gains a wife), and Vasilisa finds a new life. It is interesting to note that, in the zero-shot learning, Gemini decides to provide information about how the content of the story relates to the Spheres of Action that it considers Propp to have defined. 142 It is also identifying a number of inferences of implicit content that is not mentioned in the story but is indeed inferable from the text. Examples of this are the fact that Vasilisa has left home (to go to town) or that Vasilisa gains new status as a result of having married the Tsar. Another relevant detail is that Gemini assigns 11 character functions to this story, whereas Propp only assigned 7. Propp himself had assigned to this tale only 7 character functions, though not the same as ChatGPT. 3.3. Few-Shot Tagging Using ChatGPT Finally, we wanted to test the performance of the system at a few-shot learning task. So we replicated the experiment above with an enriched prompt that explicitly included the list of character functions and their definitions. I need to annotate some tales using Propp’s character types and narrative functions. These are the tags and definitions I need to use: For the character types: + Hero: The role model of a story. + Villain: The negative character who creates struggles for the hero. + Donor: The character who provides some magical object to the hero. + Helper: The character who helps the hero. + Princess: The character who becomes a companion of the hero. + Dispatcher: The character who illustrates the need for the hero’s quest and sends the hero off. + False Hero: The character who takes credit for the hero’s actions For the narrative functions: + Absentation: The villain or a member of the family leaves the home + Branding: The hero is marked or identified + Interdiction: A prohibition is imposed on the hero + Victory: The hero defeats the villain + Interdiction violation: The interdiction is violated + Resolution of trigger: Villainy or lack resolved + Reconnaissance: The villain attempts to obtain information + Return: The hero returns + Delivery: The villain gains information or an object + Pursuit: The villain chases the hero + Trickery: The villain deceives the hero to gain an advantage + Rescue from pursuit: The hero is saved from pursuit or danger + Complicity: The victim submits to deception + Unrecognized arrival: The hero arrives unrecognized + Villainy or lack: The villain causes harm or injury or someone lacks something they want + Unfounded claims: A false hero claims hero status + Mediation: Misfortune is made known + Difficult task: The hero is given a difficult task + Counteraction: The hero reacts to the villain’s actions + Solution: The task is accomplished or resolved + Departure: The hero leaves home + Recognition: The hero is recognized or acknowledged + 1st donor function: The hero is tested by a potential donor + Exposure: The villain’s identity or deception is exposed + Hero’s reaction: The hero reacts to the test + Transfiguration: The hero undergoes a transformation + Receipt of agent: The hero acquires a magical agent + Punishment: The villain receives punishment or consequences + Transfer: The hero is hero directed or taken to new location + Wedding: The hero marries or is rewarded + Struggle: The hero and villain engage in direct conflict Annotate this tale: 143 We tested this with ChatGPT and for the previous example (tale 104 as described in Propp’s book), the annotation produced is: Character Types: Hero: Vasilisa Villain: None explicitly mentioned Donor: The miraculous doll Helper: The old woman Princess: Vasilisa herself, who becomes the tsar’s wife Dispatcher: The tsar False Hero: None explicitly mentioned Narrative Functions: Dispatcher: The tsar sends for Vasilisa. 1st donor function: The miraculous doll aids Vasilisa in spinning and weaving. Receipt of agent: The doll makes a loom for Vasilisa. Delivery: Vasilisa presents the fine linen to the tsar. Difficult task: The tsar orders shirts to be sewn from the fine linen. Solution: Vasilisa sews the shirts. Wedding: Vasilisa marries the tsar. An important point to note here is that the system response does not actually annotate the text in the sense of respecting the text and assigning labels to specific segments of it. Instead it is declaring what labels might be applicable to the text in some sense, and for each one it provides a description of the events in the text that justify the assignment. For instance, the 1st donor function label is justified with the sentence “The miraculous doll aids Vasilisa in spinning and weaving”, which can be understood as a condensed description of the various tasks that Vasilisa carries out in the story, taken in the context that we have been told that the magical doll is helping her. The summary is actually incorrect, or maybe just not altogether precise, as the doll actually only provides the magical loom with which Vasilisa weaves, and it does not actually help with the spinning. This presents a problem in how this type of response should be annotated. First, because it is difficult to establish whether the system assignment of the label matches accurately Propp’s assignment–the boundaries of the segment of text considered to justify the label are not defined. Second, because the assignment appears to be based on an incorrect interpretation of the facts in the story. In this particular case, the 1st donor function label is incorrectly assigned, because Propp defines the first action of the donor to be testing of the hero by the donor to decide whether he is worthy of gift or assistance. The system has also made an incorrect inference here. A similar case can be observed for the inference supporting the Delivery tag: the system justifies it by saying Vasilisa delivers the linen to the tsar, when the story says the old woman does. A more refined option might be to include in the prompt an example of how the annotations should be presented in combination with the input. 3.4. Few-Shot Tagging Using Gemini We tested the same prompt-enriched with the explicit list of character functions to use–with GeminiPro and for the previous example (tale 104 as described in Propp’s book), the annotation produced is: Annotating the Tale of Vasilisa using Propp’s Framework Characters: 144 Hero: Vasilisa Donor: The Doll (magical helper) Dispatcher (indirect): The Tsar’s lack of wife creates an opportunity for Vasilisa. Narrative Functions: Absentation: The Tsar lacks a wife (lack). Hero’s possession of a magical agent: Vasilisa has the miraculous doll. Departure: Vasilisa goes to town (possibly seeking a new life). Helper: The old woman provides Vasilisa with flax. Difficult Task (indirect): Vasilisa spins unusually fine yarn, implying exceptional skill. Receipt of Magical Agent (indirect): The doll creates a loom for Vasilisa, demonstrating its magical abilities. Difficult Task: Vasilisa weaves unusually fine linen, again showcasing her skill. Mediation: The old woman takes the linen to the Tsar, revealing Vasilisa’s talent. Transfer: The Tsar sends for Vasilisa, bringing her closer to her potential reward. Marriage: Vasilisa marries the Tsar (fulfills the lack and resolves the story). Notes: This tale is a simplified version and omits some of the more complex elements of Propp’s morphology. The villain and some other functions (e.g., struggle, victory) are absent. The focus is on Vasilisa’s exceptional skills, aided by the magical doll, which ultimately lead to her marriage with the Tsar. It is interesting to see that Gemini has some difficulty in identifying the difference between character types and narrative functions as described in the prompt. This results in an annotation that mixes both kinds of labels indistincly. 4. Discussion Regardless of the impressive nature at first glance of the replies provided by these large language models, it is very easy for them to be incorrect to a surprising degree. This is especially true for queries that require responses involving detailed recall of definitions. The examples reported above of how the different LLMs recall the actual list of Propp’s character functions shows both the strengths and the weaknesses of these models. The very nature of these systems implies that the text of any response is dynamically con- structed on demand from a neural representation in an underlying latent space. For this reason, responses are very likely to be paraphrases rather than verbatim transcriptions. This has the advantage of allowing for a certain leeway in the interpretation of definitions. These systems are more likely to identify instances of a particular character function even when they are phrased in a significantly different way from the original definition. However, it also has the disadvantage of relying on potentially divergent understandings of specific function. The set of definitions provided for the character functions when prompted suggests that this is the case at least for ChatGPT. This suggests that the set of tags being used to annotate by ChatGPT in a zero-shot approach to the task is simply incorrect. To provide some quantitative measure of the degree of divergence, the set of tags assigned to the sentences in the tale is compared with the assignment that Propp provides for it in his book. In appendices II and III, Propp provides examples of how the plots of a number of tales can be described in terms of his set of character functions. The tales so described are referred to only by number, and the assignment of character functions is given over a synopses of the 145 Table 3 Metrics for performance of different models and prompting strategies on assignment of character functions to tale synopses with respect to Propp’s assignment. Metrics for different model + prompting strategy pairs is shown in a different column. For each model + prompting strategy pair, the first pair of rows in the table show: precision and recall against Propp’s assignment, the second pair of rows shows percentage of labels that show some linguistically plausible relation to corresponding story content, and percentage of positions identified by Propp as involving a character function that the system offers a label for, and the third pair of rows shows percentage of incorrect assignments of Propp’s categories, and percentage of labels assigned not equivalent to any Proppian character function. ChatGPT Gemini zero shot few-shot zero shot few-shot Precision 0.38 0.37 0.41 0.46 Recall 0.26 0.22 0.31 0.34 Assignment meaningful 70 66 77 78 Response required 33 33 74 72 Incorrect assignments 26 29 16 44 Invented labels 01 03 10 04 tale. These synopses have been used as input to the system. The tales considered correspond to numbers 93, 104, 123, 127, 131, 133, 139, 155, 198, 244 and 247. The assignments provided of character functions to segments of the synopses required a detailed analysis. The responses by the various models include some kind of label together with a justification of which part of the story the label is assigned to, both labels and descriptions tend to be paraphrases rather than references to specific character function names or specific segments of the text. This makes simple measures like precision and recall against Propp’s original assignment difficult to compute and even more difficult to interpret. To circumvent this problem, in all cases where the model suggested a label that could reasonably be considered to refer to the one assigned by Propp it has been considered valid. In the hope of making the analysis more informative, four additional metrics have been computed. First, the number of times that identifiable references to Propp’s character functions have been assigned to segments of the story that they do not apply to, which measures correct match between label and text. Second, the number of times that labels have been used that cannot in be related in some way to the character functions defined by Propp, which measures familiarity with the set of character functions. Third, the number of times that labels assigned (whether identifiable as Propp’s character functions or not) can be seen to relate to the corresponding story segment by some valid inference, which measures the language understanding ability of the system. Finally, we report a metric to measure the number of times that the system has attempted to assign a label to segments of the text to which Propp had assigned one, which measure the ability of the system to identify plot-relevant events, regardless of whether it can tag them correctly. The results for these set of metrics are shown in Table 3. The first observation to make is that the first four metrics show relatively low values. This indicates poor performance overall. It is interesting to note that for several of the metrics the zero-shot approach performs better than the few-shot approach. This happens to ChatGPT for all metrics save the identification 146 of plot-relevant segments, and to Gemini at least for the number of incorrect assignments (44 from 16). This is consistent with observed behaviour of LLMs, with quality of results dropping as the size of prompts increases. The fact that ChatGPT zero-shot had a very low recall is not surprising given that it was not really using the full set of character functions. The zero-shot approach using Gemini had both higher precision and higher recall than the other options, which is consistent with the fact that it is not confused about the character functions. Nevertheless, the scores are surprisingly low. The additional metrics reported provide some insight into the reasons for these low values. We notice very poor performance (33%) for ChatGPt at the identification of plot-relevant segments, and significant percentages of incorrect .assignments of existing labels (26% for ChatGPT and 16 for Gemini%). The number of labels assigned that do not correspond to Propp’s set is surprisingly high (10%) for Gemini zero shot given that it supposedly knows the correct set, but it drops significantly (to 4%) when prompted with the set of names. In contrast, ChatGPT reacts badly to the enriched prompt, with percentages for both incorrect assignments and invented labels. 5. Conclusions The experiments reported suggest that the use of large language models for automated tagging of stories with Propp character function may yield very different results depending on the level of familiarity of the particular model employed with the set of character functions in question. They also suggest that, if the model is not sufficiently familiar with the definitions of the character functions, a few-shot approach in which brief versions of the definitions are provided may not be enough to compensate for this. Further experiments are needed both in terms of exploration of a broader range of variations in the prompting and in the consideration of different models or versions of the models. Important aspects to consider in this sense are the differences between public web interfaces and publicly available sources for the models, and diffences in model size. Acknowledgments This paper has been partially supported by the CANTOR project (PID2019-108927RB-I00) funded by the Spanish Ministry of Science and Innovation. References [1] V. Propp, Morphology of the Folktale, University of texas Press, 1968. [2] E. D. Williams, The Fairy Tales by Madamme d’Aulnoy, Ph.D. thesis, Rice University, Houston, Texas, 1982. [3] M. A. McGowan, An analysis of the fantasy plays of James M. Barrie utilizing Vladimir Propp’s structural model of the fairy tale, Ph.D. thesis, New York University, 1984. [4] D. J. Latourette, Doctor Who meets Vladimir Propp: A comparative narrative analysis of myth/folktale and the television science fiction genre, Ph.D. thesis, Northwestern University, 1990. 147 [5] S. A. Malec, Proppian structural analysis and xml modeling, Proc. of Computers, Literature and Philology (CLiP 2001) (2001). [6] W. V. H. Yarlott, M. A. Finlayson, Proppml: A complete annotation scheme for proppian morphologies, in: 7th Workshop on Computational Models of Narrative (CMN 2016), Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2016. [7] F. Peinado, P. Gervás, B. Díaz-Agudo, A description logic ontology for fairy tale genera- tion, in: Procs. of the Workshop on Language Resources for Linguistic Creativity, LREC, volume 4, 2004, pp. 56–61. [8] T. Declerck, A. Aman, M. Banzer, D. Machá ek, L. Schäfer, N. Skachkova, Multilingual on- tologies for the representation and processing of folktales, Proceedings of the LT4DHCSEE in conjunction with RANLP (2017) 20–23. [9] F. Pannach, An ontology-driven information system based on Vladimir Propp’s morphology of the folktale for southern African folktales, Ph.D. thesis, Master’s thesis, University of Göttingen, Germany, 2019. [10] M. A. Finlayson, Deriving narrative morphologies via analogical story merging, New Frontiers in Analogy Research (2009) 127–136. [11] M. Despontin, L. Sbattella, R. Tedesco, Natural language processing for storytelling and role playing: a training system based on the propp model, in: ICERI2010 Proceedings, IATED, 2010, pp. 5036–5045. [12] M. M. A. Finlayson, Learning narrative structure from annotated folktales, Ph.D. thesis, Massachusetts Institute of Technology, 2012. [13] M. A. Finlayson, Inferring propp’s functions from semantically annotated text, Journal of American Folklore 129 (2016) 55–77. [14] M. A. Finlayson, Propplearner: Deeply annotating a corpus of russian folktales to enable the machine learning of a russian formalist theory, Digital Scholarship in the Humanities 32 (2017) 284–300. [15] B. Min, H. Ross, E. Sulem, A. P. B. Veyseh, T. H. Nguyen, O. Sainz, E. Agirre, I. Heintz, D. Roth, Recent advances in natural language processing via large pre-trained language models: A survey, ACM Computing Surveys 56 (2023) 1–40. [16] M. Alizadeh, M. Kubli, Z. Samei, S. Dehghani, J. D. Bermeo, M. Korobeynikova, F. Gilardi, Open-source large language models outperform crowd workers and approach chatgpt in text-annotation tasks, arXiv preprint arXiv:2307.02179 (2023). [17] J. Savelka, K. D. Ashley, The unreasonable effectiveness of large language models in zero-shot semantic annotation of legal texts, Frontiers in Artificial Intelligence 6 (2023). [18] M. Hasanain, F. Ahmed, F. Alam, Large language models for propaganda span annotation, arXiv preprint arXiv:2311.09812 (2023). [19] G. Team, Gemini: A family of highly capable multimodal models, 2023. arXiv:2312.11805. [20] J. Manyika, S. Hsiao, An overview of bard: an early experiment with generative ai, AI. Google Static Documents 2 (2023). [21] S. Mandvikar, Factors to consider when selecting a large language model: A comparative analysis, International Journal of Intelligent Automation and Computing 6 (2023) 37–40. 148