Animacy in German Folktales Julian Häußler1,∗,† , Janis von Keitz1,† and Evelyn Gius1 1 fortext lab, Technical University of Darmstadt, Germany Abstract This paper explores the phenomenon of animacy in prose by the example of German folktales. We present a manually annotated corpus of 19 German folktales from the Brothers Grimm collection and train a classifier on these annotations. Building on previous work in animacy detection, we evaluate the classifier’s performance and its application to a larger corpus. The findings highlight the complex- ity of animacy in literary texts, distinguishing it from named entity recognition and emphasizing the classifier’s potential for enhancing character recognition in narratives. Keywords animacy, animacy classification, folktales, Computational Literary Studies 1. Introduction [A]nd when any one attacked him he would say, “Stick, out of the sack!” and directly out jumped the stick, and dealt a shower of blows on the coat or jerkin, and the back beneath, which quickly ended the affair. The Table, the Ass, and the Stick Household Stories by the Brothers Grimm translated by Lucy Crane [6] Folktales feature not only humans, but also talking animals as well as living objects. For exam- ple, in the folktale The Table, the Ass, and the Stick a speaking donkey sends out three brothers into the world by a trick and one of the brothers acquires a command executing stick. The don- key and the stick are animate entities which break with the rules of common world knowledge. These animate entities are positioned between simple objects or animals and human characters and contribute to key plot points. Besides its obvious relevance in folktales, animacy is also relevant in contexts such as in the romanticist understanding of nature [1], the depiction of machines [5] or in present day discourse around artificial intelligence. It is thus connected to concepts such as agency and it is closely connected to characters in fiction. CHR 2024: Computational Humanities Research Conference, December 4 – 6, 2024, Aarhus, Denmark ∗ Corresponding author. † These authors contributed equally. £ julian.haeussler@tu-darmstadt.de (J. Häußler); janis.von_keitz@tu-darmstadt.de (J. v. Keitz); evelyn.gius@tu-darmstadt.de (E. Gius) ȉ 0000-0001-7490-8570 (J. Häußler); 0009-0002-9760-3600 (J. v. Keitz); 0000-0001-8888-8419 (E. Gius) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 1023 CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings The goal of this paper is to scrutinize animacy in German folktales as a phenomenon of interest for Computational Literary Studies (CLS) by showcasing the manual annotation of animacy and presenting a classifier trained on these annotations. The overall approach builds on the work of Karsdorp et al. [9] who developed an approach to animacy detection in Dutch folktales. In the following we discuss previous work on animacy (section 2), present our corpus of German folktales from the Children’s and Household Tales by the brothers Grimm as well as our understanding of animacy and its manual annotation (section 3). Furthermore, we evaluate its relation to the neighboring concepts of fictional characters and named entity recognition (section 4). We then reproduce Karsdorp et al.’s approach for German folktales, evaluate the results of our classification and apply them to a larger corpus (section 5). We close summarizing our findings and sketching possible directions for future work (section 6). 2. Animacy in Text-based Research The concept of animacy is crucial in human perception for distinguishing between living and non-living entities. Animacy perception, the ability of which is developed in early childhood and might be innate [9], is based on unpredictability of biological life but also on agency, in- fluenced by movements and mental states [20]. This perception extends to reading texts. In fiction, the recognition of animacy is influenced by the narrative context, allowing fictional entities to be perceived as animate even if they aren’t in real life [12]. In text based research, animacy has been introduced as a grammatical category by Michael Silverstein in the 1970s. He suggested a hierarchy for describing languages on a general level, ranking grammatical phenomena according to animacy: 1st person > 2nd person > 3rd per- son/deictics > human NPs > animate NPs > inanimate NPs [4]. This hierarchy influences gram- matical structures in many languages, affecting aspects such as inflection and word order. For example, in German as well as in English and other languages, animacy influences the choice of interrogative pronouns and the use of certain verbs (e.g., “schauen” /“to look” requires animacy of the semantic subject). Despite this systematic hierarchy, animacy isn’t a simple linear scale. It is influenced by additional parameters, including the perception of empathy and sensation. Objects like computers or organizations are sometimes considered animate due to attributed intelligence or agency, complicating the distinction between animate and inanimate [21]. In literary studies, a definitive concept of animacy has yet to be established. Still, it can be ex- plored through stylistic devices like personification and anthropomorphism, as well as through the characterization of fictional characters. Personification assigns human attributes to non- human entities, while anthropomorphism extends this by giving them human-like forms and mental attributes. Additionally, narratological theories examine how characters, including po- tentially inanimate ones, are constructed and perceived, highlighting the complexity of char- acter portrayal through textual indicators, language use, and readers’ cognitive engagement [8]. In NLP and adjacent fields, the binary classification of entities in texts into animate and inan- imate entities is particularly relevant. Animacy classification aids numerous NLP tasks such as anaphora or coreference resolution, dependency parsing, word sense disambiguation, semantic role labeling, as well as automatic text generation and translation [7, 9]. Determining whether 1024 a pronoun refers to an animate or inanimate antecedent significantly simplifies anaphora and coreference resolution in many languages. Since animacy also influences grammatical struc- tures in many languages, it also affects dependency parsing and semantic role labeling. In automatic text generation, taking into account the animacy required by verbs is essential for generating semantically correct sentences. In Computational Literary Studies, animacy classification can help identify characters in nar- ratives [7, 16]. However, fictional worlds in literature can challenge traditional animacy clas- sification, as objects or plants may act as agents, diverging from real-world knowledge. Rule- based systems with semantic lexicons like WordNet might misclassify such entities. Therefore, animacy classification in narrative texts should build on contextual understanding rather than fixed rules [9]. Hybrid systems combining machine learning with rule-based methods show promise in addressing these challenges .[7] used a hybrid system combining a support vector machine classifier with a rule based classification system and achieved an 𝐹1 of 0.88 for clas- sifying animacy. [9] tried using a, as they call it, linguistically uninformed model with word embeddings and achieved an 𝐹1 of 0.91 for the animate class. 3. Data 3.1. Corpus Our approach is based on a corpus of 19 German folktales (see Appendix A). These were selected from the Brother Grimm’s Children’s and Household Tales (Kinder- und Hausmärchen), a collection of folktales published from 1812 onwards [2, 3]. The texts were collected from Wikisource, where all editions of the collection are available digitally. For selecting texts we reviewed all 201 tales and 10 children’s legends for entities that are depicted as animate but cannot be categorized as humans or animals in everyday terms. We excluded the cases differing significantly in meaning and function from inanimate entities that are animate and have a tangible counterpart in the real world. Meaning, texts containing su- pernatural phenomena, such as the anthropomorphization of divine beings, the personification of events like death, or metaphorical descriptions in which animacy is used as a stylistic were not included. Moreover, humans or animals transformed into inanimate entities within the fictional world were not considered if they exclusively displayed inanimate qualities. Magical entities were examined case-by-case, as these often represent borderline cases of animacy de- piction. Since depiction of animacy is strongly related to independent action, texts where this action is explicitly described for magical entities were included.1 3.2. Manual Animacy Annotations We annotated the 19 folktales in our corpus with regard to animacy. The full annotation guide- line is available in Appendix B. Our animacy concept is connected to coreference annotation, as we not only annotate (proper) nouns but also mentions of animacy. However, our approach 1 Our approach therefore differs from the principles in the Aarne–Thompson–Uther Index [19] and the Motif-Index of Folk-Literature [18]. The ATU classifies animals tales but disregards animacy and the Motif-Index bases the classification of animals with human traits only on speech or role, but not on agency. 1025 differs from the one by [7] in the way that they use pre-annotated coreference chains, anno- tating animacy in nouns, gendered pronouns and adjectives. It also differs from [9] as their animacy concept is based on the rationality and intentionality of an entity, whereas we base our animacy understanding on agency and speech. However, like [9] we use untagged data. We consider an entity animate if one of the three conditions is met: 1. The entity performs an action independently and fulfills the agent role of a verb. 2. The entity makes independent verbal utterances. 3. The entity is described by a lexeme that refers to a living being, irrespective of its role or actions in the sentence. Unless, an additional description explicitly excludes animacy (e.g., a dead relative). In order to have an overview of entities in the text and to be able to relate to each entity, for every animate entity one mention was annotated as recognizable mention (rm) in the first iteration of annotation. In the second iteration all other expressions referring to animate enti- ties were marked as animate. The referring expressions include proper names, descriptions by attributes (such as profession, gender, appearance, or social status), and pronouns and can be single or multiple token occurrences. A second annotator has annotated KHM 6 and 10, resulting in an average Cohen’s kappa of 0.87.2 Disagreement stems mostly from the second annotator tending to oversee several articles as well as possessive and reflexive pronouns, while also tending to annotate shorter spans (e.g. only “goldsmiths” instead of “the goldsmiths of the empire”). However the first annotator also overlooked several personal pronouns, we decided therefore to make the annotation of all relevant pronouns more explicit in the guidelines. 4. Animacy and Related Concepts 4.1. Animacy and Literary Characters In order to investigate the relationship between animate entities and characters we performed an additional annotation of characters (cf. the third iteration in the guidelines in Appendix B). Additionally we further categorized the entities according to their degree of animacy, ranking from human and animal to inanimate, and supernatural (cf. Table 1). A closer look at the data shows that animated entities appear more frequently as characters in fairy tales, with humans making up more than the half of the characters and often serving as protagonists even in our selection of folktales which is skewed towards non-human animate entities. Animals are por- trayed as characters when they are humanized, transformed into humans, or perform certain functions, while animals that are not characters are often tamed, play a secondary role in an- imal stories, or serve a single function. While inanimate objects appear less frequently, they often become characters when animated for narrative purposes, emphasizing the intentional use of animated inanimate objects in the stories. 2 In order to calculate the inter-annotator-agreement we assigned animate/inanimate tags to each token, splitting multiword expressions. If one annotator annotated “trusty John” and the other annotator annotated only “John”, the name gets counted as a match while the adjective doesn’t. 1026 Table 1 Animacy and characters: Share of human, animal, inanimate and supernatural tokens in manual anno- tation of characters (occurrences and percentage). animacy type human animal inanimate supernatural total character 84 (53.5%) 32 (20.4%) 41 (26.1%) 0 (0%) 157 (100%) not character 31 (38.3%) 35 (43.2%) 12 (14.8%) 3 (3.7%) 81 (100%) total 115 (48.3%) 67 (28.2%) 53 (22.3%) 2 (0.8%) 238 (100%) 4.2. Animacy and Named Entities We now look into named entity recognition which is currently the default approach to character analysis in CLS. The analysis of the results of NER with Stanza [14] and our manual animacy annotation reveals a disparity between entities recognized by NER and those annotated as animate, with only 193 tokens overlapping (cf. Table 2). In terms of distribution, 106 tokens are exclusively named entities, mostly involving mere mentions of names without action, while 5,588 cases are exclusively animate. The scarcity of entity annotations for animate entities can primarily be attributed to the NER approach in which pronouns and articles are not considered entities. But there are also errors in the NER in which some proper names and appellatives where not identified properly as named entities. Among the identified named entity types are 173 animate and 106 inanimate PER tokens, and 20 animate (and 0 inanimate) LOC tokens. Entities that were are annotated as animate and as named entities include diminutive forms, professions, kinship terms, and celestial bodies like “Sonne” (sun) and “Mond” (moon). Next to these correctly identified cases there are several missed mentions. The cases already mentioned as well as other animacy mentions are only inconsistently recognized as named enti- ties for recurrent occurrence. For instance, “Besenchen,” (diminutive of broom) “Bohne,” (bean) and “Drechsler” (wood turner) are recognized as S-PER (single-token person entities) only in some of their occurrences within the same text, while others like “Gänsemagd” (goose maid) and “Fuchs” (fox) show varied recognition across different texts. Instances of “Berg Semsi” (semsi mountain) are consistently annotated as animate but only recognized four times as LOC and one time as PER out of ten cases. Also, unique tokens such as “Söhnlein” (diminutive of son) and archaic forms like “Thier” (animal) are noted for their inconsistent recognition. The observation that named entity recognition (NER) does not fully encompass animacy detection suggests that, even when disregarding NER errors3 , animacy is a more effective cri- terion for character detection (cf. animacy scores in Table 3). 3 We additionally annotated PER entities in six folktales (KHM 6, 10, 11, 18, 24, 28). Stanza NER classification only reached on F1 score of 0.7 (P: 0.63, R: 0.78) for these which is a considerably worse performance than animacy detection. 1027 Table 2 Share of named entities (with Stanza) in the manually annotated animate entities. animate inanimate named entity 193 106 no named entity 5,588 n/a 5. Animacy Classification 5.1. Implementation of the Classifier In examining their annotated data, Karsdorp et al. observe that the part of speech of a word is already a sort of ’weak’ indicator for a word to be animate, as 40% of tokens they annotated as animate are nouns or proper nouns, while only 11% of tokens tagged as inanimate are nouns [9]. A finding we can confirm at least in part, as 26.5% of our tokens annotated as animate are nouns or proper nouns and 5.7% of tokens tagged as inanimate are nouns or proper nouns. [9] build on this observation by not only training their classifier on the manually annotated data but also adding various linguistic features in order to find a best performing combination for the training input. They run several experiments where they always include the manually annotated tokens in a rolling context window of three token to the left and right (which they call the lexical input). They subsequently combine this base data with the rolling context window of the lemma, the part-of-speech tags (i.e. morphological features), the dependency tags (i.e. syntactic features) and the embedding vector of the target token taken from a Word2Vec model built on a web corpus (i.e. semantic features). We reproduced their way of creating lexical, morphological and syntactic features using the Stanza library [14]. We furthermore trained a Word2Vec model which we deem comparable to the literary language of the time, trained on 115 novels from the German Romantic era [17]. This Word2Vec model was trained using Gensim [15], which is based on [11]. We used the same parameters as [9], who use the skip- gram architecture with a vector size of 300 (the other parameters were set to default).4 5.2. Evaluation For evaluating the results we calculated the F1-scores using 10-fold cross validation, differenti- ating between the much larger class of inanimate and the class of animate entities (cf. Table 3). While we reached lower F1-scores, our results are comparable to the results of [9] with re- gard to the combination of lexical features (tokens), part-of-speech tags and embedding vector yielding the best result (F1-score of the Dutch classifier for the animate class of 0.93). Furthermore, we experimented with adding more annotated data to see if the performance of the classifier plateaus at a certain point. For this, we annotated six additional KHM folktales. Subsequently, we incrementally expanded the data for the classifier with all features by adding one of these fairy tales at a time and conducted a 10-fold cross-validation to observe the evolu- 4 With this we have successfully reproduced the workflow by [9] concerning lexical and the combination of lexical and semantic features. However, we have not yet determined how to incorporate additional features into this training process. We used the same classification algorithm (Maximum Entropy, as implemented in scikit-learn, [13]). 1028 Table 3 Evaluation of classification for animate and inanimate tokens (10-fold cross validation). inanimate animate P R F1 P R F1 lexical features 0.9512 0.9764 0.9636 0.8776 0.7718 0.8212 all features 0.9596 0.9726 0.9661 0.8671 0.8137 0.8395 Table 4 F1 Scores for the classification of animate tokens during incremental data expansion. Added tale 𝐹1 Score base case (19 atypical animacy folktales) 0.8395 + KHM 1 The Frog King, or Iron Heinrich 0.8329 + KHM 2 Cat and Mouse in Partnership 0.8330 + KHM 3 Mary’s Child 0.8326 + KHM 4 The Story of the Youth Who Went Forth to Learn What Fear Was 0.8295 + KHM 5 The Wolf and the Seven Young Kids 0.8296 + KHM 7 The Good Bargain 0.8296 Figure 1: Relative frequency of animate entities of the 211 Children’s and Household Tales. tion of the F1 score of the animate class. It was found that the score did not increase; rather, it tended to decrease slightly (cf. 4). 5.3. Implementation in German Folktales The application of our the classifier to the entire corpus of 211 Children’s and Household Tales yields the results shown in Figure 1. The average proportion of animated tokens is 16%. The 20 texts with a proportion of <=10% (bottom outliers) consistently are not written in standard German. A spot-check of the annotations indicates reasonably good results. The classifier even dis- cerns correctly between mere name references in direct speech and mentions of animate enti- 1029 ties. For example, the main character of the eponymous tale KHM 55 Rumpelstiltskin is anno- tated as animate when referred to as “Männchen” (little man), whereas the two proper name mentions used only as a name reference in direct speech are classified correctly as inanimate. In KHM 34 Clever Elsie the character is not classified as animate in direct speech but is recognized as such in narrative parts and in KHM 166 Strong Hans the character “Hans” is consistently rec- ognized correctly as animate. However, the classifier struggles with correctly detecting rare and complex tokens. For example, the more common “Fuchs” (fox) is recognized more reliably than the more complex form “Rothfuchs” (red fox) in KHM 73 The Wolf and the Fox. On the other hand, the animate “Vogel” (bird) in contrast to the inanimate “Vogelherz” (bird heart) in KHM 122 Donkey Cabbages are classified both correctly. 6. Discussion and Outlook Our approach to animacy classification achieves a reasonably good detection of animacy and its application to a corpus of German folktales provides some interesting insights. With regard to the assumed relation between named entities and animacy, we have shown only partial overlap both for the concepts and for their detection. In other words, our results indicate that we did not simply achieve NER in place of animacy detection. These outcomes demonstrate that our animacy approach is distinct from NER. In fact, correlation between the relative frequency of person entities (automatically tagged using [14]) and the relative frequency of animate entities (using our classifier) in the Grimm corpus is rather low, with a Pearson correlation coefÏcient of -0.132 and Spearman’s correlation coefÏcient of -0.195. The classifier adheres to our animacy framework both with regard to animacy and inanimacy. Furthermore, the classifier addresses a gap in detecting animated animals and objects. This capability suggests that with further development, it could also enhance character recognition. Accordingly, future work should explore the combination with NER and coreference resolu- tion for the identification of characters as well as the potential of LLMs for the annotation of animacy. From the perspective of analysis, also sorting out human entities would be an inter- esting future step, allowing to analyze animate objects, animals, and other potential candidates for characters not displaying features of person entities. References [1] R. Borgards, F. Middelhoff, and B. Thums, eds. Romantische Ökologien: Vielfältige Naturen um 1800. Vol. 4. Neue Romantikforschung. Berlin, Heidelberg: Springer, 2023. doi: 10.10 07/978-3-662-67186-3. [2] Brüder Grimm. Kinder und Hausmärchen: Band 1. 7th ed. Göttingen: Verlag der Dieterich- schen Buchhandlung, 1857. url: https://de.wikisource.org/wiki/Kinder-%5C%5Fund%5 C%5FHaus-M%5C%C3%5C%A4rchen%5C%5FBand%5C%5F1%5C%5F(1857). [3] Brüder Grimm. Kinder und Hausmärchen: Band 2. 7th ed. Göttingen: Verlag der Dieterich- schen Buchhandlung, 1857. url: https://de.wikisource.org/wiki/Kinder-%5C%5Fund%5 C%5FHaus-M%5C%C3%5C%A4rchen%5C%5FBand%5C%5F2%5C%5F(1857). 1030 [4] H. Bußmann, ed. Lexikon der Sprachwissenschaft. 4th ed. Stuttgart: Alfred Kröner Verlag, 2008. [5] M. Coll Ardanuy, F. Nanni, K. Beelen, K. Hosseini, R. Ahnert, J. Lawrence, K. McDonough, G. Tolfo, D. C. Wilson, and B. McGillivray. “Living Machines: A study of atypical ani- macy”. In: Proceedings of the 28th International Conference on Computational Linguistics. Ed. by D. Scott, N. Bel, and C. Zong. Barcelona, Spain (Online): International Committee on Computational Linguistics, 2020, pp. 4534–4545. doi: 10.18653/v1/2020.coling-main .400. [6] “The Table, the Ass, and the Stick”. In: Household Stories, illustrated by Walter Crane, translated by Lucy Crane. Ed. by J. Grimm and W. Grimm. Trans. by L. Crane. 1882. url: https://en.wikisource.org/wiki/Household%5C%5Fstories%5C%5Ffrom%5C%5Fthe%5 C%5Fcollection%5C%5Fof%5C%5Fthe%5C%5FBros%5C%5FGrimm%5C%5F(L%5C%5F%5 C%26%5C%5FW%5C%5FCrane)/The%5C%5FTable,%5C%5Fthe%5C%5FAss,%5C%5Fand %5C%5Fthe%5C%5FStick. [7] L. Jahan, G. Chauhan, and M. Finlayson. “A New Approach to Animacy Detection”. In: Proceedings of the 27th International Conference on Computational Linguistics. Ed. by E. M. Bender, L. Derczynski, and P. Isabelle. Santa Fe, New Mexico, USA: Association for Com- putational Linguistics, 2018, pp. 1–12. [8] F. Jannidis. Figur und Person. Beitrag zu einer historischen Narratologie. Berlin: de Gruyter, 2004. [9] F. Karsdorp, M. van der Meulen, T. Meder, and A. van den Bosch. “Animacy Detection in Stories”. In: 6th Workshop on Computational Models of Narrative (CMN 2015). Ed. by M. A. Finlayson, B. Miller, A. Lieto, and R. Ronfard. Vol. 45. Open Access Series in Informatics (OASIcs). Dagstuhl, Germany: Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2015, pp. 82–97. doi: 10.4230/OASIcs.CMN.2015.82. [10] S. Lahn and J. C. Meister. Einführung in die Erzähltextanalyse. Stuttgart: Metzler, 2016. [11] T. Mikolov, K. Chen, G. Corrado, and J. Dean. EfÏcient Estimation of Word Representations in Vector Space. 2013. doi: 10.48550/arXiv.1301.3781. [12] M. S. Nieuwland and J. J. A. van Berkum. “When peanuts fall in love: N400 evidence for the power of discourse”. In: Journal of cognitive neuroscience 18.7 (2006), pp. 1098–1111. doi: 10.1162/jocn.2006.18.7.1098. [13] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. “Scikit-learn: Machine Learning in Python”. In: Journal of Machine Learning Research 12.85 (2011), pp. 2825–2830. [14] P. Qi, Y. Zhang, Y. Zhang, J. Bolton, and C. D. Manning. “Stanza: A Python Natural Language Processing Toolkit for Many Human Languages”. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. Ed. by A. Celikyilmaz and T.-H. Wen. Online: Association for Computational Linguistics, 2020, pp. 101–108. doi: 10.18653/v1/2020.acl-demos.14. 1031 [15] R. Řehůřek and P. Sojka. “Software Framework for Topic Modelling with Large Corpora”. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks. Val- letta, Malta: Elra, 2010, pp. 45–50. [16] D. Schmidt, A. Zehe, J. Lorenzen, L. Sergel, S. Düker, M. Krug, and F. Puppe. “The FairyNet Corpus - Character Networks for German Fairy Tales”. In: Proceedings of the 5th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sci- ences, Humanities and Literature. Ed. by S. Degaetano-Ortlieb, A. Kazantseva, N. Reiter, and S. Szpakowicz. Punta Cana, Dominican Republic (online): Association for Computa- tional Linguistics, 2021, pp. 49–56. doi: 10.18653/v1/2021.latechclfl-1.6. [17] M. Schumacher, I. Uglanova, and E. Gius. d-Romane-Romantik (d-RoRo). 2022. doi: 10.52 81/zenodo.7215170. [18] S.-1. Thompson. Motif-index of folk-literature: a classification of narrative elements in folk- tales, ballads, myths, fables, mediæval romances, exempla, fabliaux, jest-books, and local legends : A - C. Vol. 1. A - C. Indiana University studies ; Vol. 19, No. 96/97. Bloomington, Ind.: Univ. Libr., 1934. [19] H.-J. [ Uther. The types of international folktales: a classification and bibliography; based on the system of Antti Aarne and Stith Thompson. FF Communications. Helsinki: Suoma- lainen Tiedeakatemia, 2004. [20] M. Westfall. “Perceiving agency”. In: Mind & Language 38.3 (2023), pp. 847–865. doi: 10.1111/mila.12399. [21] M. Yamamoto. Animacy and reference: A cognitive approach to corpus linguistics. Vol. Vol. 46. Studies in language Companion series, SLCS. Amsterdam and Philadelphia, PA: John Benjamins Publ, 1999. doi: 10.1075/slcs.46. 1032 A. Corpus • KHM 6 Trusty John • KHM 10 The Pack of RagamufÏns • KHM 11 Brother and Sister • KHM 18 The Straw, the Coal, and the Bean • KHM 24 Mother Holle • KHM 28 The Singing Bone • KHM 30 The Louse and The Flea • KHM 36 The Table, the Ass, and the Stick • KHM 41 Herr Korbes • KHM 42 The Godfather • KHM 49 The Six Swans • KHM 56 Sweetheart Roland • KHM 80 The Cock and the Hen • KHM 88 The Singing, Springing Lark • KHM 89 The Goose Girl • KHM 103 Sweet Porridge • KHM 142 Open Sesame • KHM 171 The Willow-Worn • KHM 188 Spindle, Shuttle, and Needle B. Annotation Guidelines The annotation process is designed for using the CATMA platform and proceeds through three iterations. It is based on the understanding of mentions from coreference chains. The annota- tion span ranges from single tokens to multi word expressions. Iteration 1: Overview of Entities In the first iteration, the goal is to provide an overview of all animate entities and to be able to relate to each entity. Each entity recognized as animate is marked with a clearly identifiable mention in the text that we call recognizable mention (rm). The mention does not necessarily need to be the first occurrence of the entity; rather, it should be one that allows for quick identification. An entity qualifies as animate if at least one of the first three criteria is met: 1. The entity performs an independent action explicitly described in the text, occupying the agent role of a verb. 2. The entity makes independent verbal expressions. 3. The entity is described by a lexeme that refers to a living being, irrespective of its role or actions in the sentence. Unless, an additional description explicitly excludes animacy (e.g., a dead relative). 1033 For example in the sentence “directly jumped out the stick, and dealt a shower of blows on the coat or jerkin, and the back beneath, which quickly ended the affair” (KHM 36 The Table, the Ass and the Stick, [6]) the stick’s agent role is evident. Therefore, it meets the first criterion and is annotated as animate. Whereas in the sentence “When placed and spoken to, ‘Little table, set yourself,’ it would immediately be covered with a clean cloth, with plates, knives, and forks beside it” (KHM 36 The Table, the Ass and the Stick) an independent action is implied, although it is not explicitly depicted. The little table does not occupy an agent role and is therefore not marked as animate. As an example for the second criterion we look at the sentence “but the bread called out, ‘Oh, take me out, take me out, or I’ll burn; I’ve been done for a long time.’” (KHM 24 Mother Holle). The bread is the originator of an independent verbal statement and is hence marked as animate. The third criterion can be observed in the description “After lifting the girl onto his horse, the old woman showed him the way” (KHM 49 The Six Swans). Although the horse is not in an agent position here, readers’ world knowledge recognizes a horse as an animate entity, so it is marked as animate. This extends to entities that are not characters, such as relatives mentioned but not directly appearing in the text, which are also annotated. Furthermore, a new recognizable mention gets annotated for entities that are transformed radically, where the transformed entity also satisfies one of the conditions explained above. E.g. in KHM6 Trusty John the title character gets transformed into a speaking stone. If multiple entities get introduced as a group (“three ravens”), the first mention of the group gets annotated instead of single first mentions of each member of the group. To further clarify the rules, some borderline cases are discussed in the following. In some fairy tales, the narrator appears through a first-person reference and the reader is also refer- enced. • “eagle and finch, owl and crow, lark and sparrow, what should I call them all?” (KHM 171 The Wren) • “and the donkey didn’t stop until everyone had so much that they couldn’t carry anymore. (I can see it in your face, you would have liked to be there too.)” (KHM 36 The Table, the Ass, and the Stick) The narrator and the recipient are regarded here as textual constructs with no real-world counterpart [10, p. 61]. As a result, their reference expressions cannot be assigned a definite degree of animacy. Common borderline cases are magical objects that appear in fairy tales. Rule (1) has already clarified that explicit independent action is a prerequisite for the animacy annotation. However, cases occur where classification is still ambiguous. • “the way was so hard to find that he would not have found it if a wise woman had not given him a ball of yarn; when he threw it in front of him, it unwound by itself and showed him the way.” (KHM 49 The Six Swans) “Now she could not rest until she found out where the king kept the ball of yarn” (KHM 49 The Six Swans) 1034 The ball of yarn in the first excerpt clearly occupies the agent role of the verbs “unwind” and “show.” In the second quote, however, it is used with the verb “keep,” which typically requires an inanimate object. Based on this case, it was decided that a single animate occurrence is sufÏcient for marking the entity as animate. Iteration 2: Annotation of all Mentions In the second iteration, all reference expressions referring to entities marked as animate in the previous iteration were annotated. Reference expressions include all noun phrases containing proper names including the article, descriptors based on attributes such as occupation (“the brave little tailor”), gender, appearance (“the beautiful one”), or social status (“the poor man,” “the princess”), as well as personal, demonstrative, relative, possessive, or indefinite pronouns. Additionally, all expressions referring to such noun phrases through any reference type were included. This annotation level provides an overview of where and how often animate entities appear. Borderline cases in the annotation process include vague references (“everyman”), re- flexive verb constructions (“he withdrew himself”), or entities recognized as animate only later in the story. Vague references and reflexive pronomina in reflexive verb constructions are not annotated animate because they do not refer to any specific animate entity. Entities that ap- pear as inanimate but can be recognized animate over the course of the story are consistently marked as animate. Iteration 3: Annotation of Character Status and Animacy Degree In the third and final iteration, the existing annotations are enriched with the properties ’char- acter’ and ’degree of animacy’. The former indicates if the animate entity is a character or not. The latter marks it as human, animal, object or supernatural. An entity is marked as a character if the description in the text includes some form of the semantic feature “human” (“the king”). Another indicator is the association with verbs describ- ing typically human actions. Entities that are sources of verbal expressions (“the lion spoke”) or exhibit a complex inner life or thinking are also marked. Some borderline cases include groups of people. • “The king summoned all goldsmiths, who had to work day and night” (KHM 6 Trusty John) • “Then the other servants of the king, who did not favor Faithful John, shouted, ‘How shameful to kill the beautiful animal that was to carry the king to his castle!’ ” (KHM 6 Trusty John) Individual cases must be distinguished. The term “goldsmiths” can be semantically associ- ated with the occupation of a human, but no individuals are visible in this description, so they do not appear as characters. The servants, even collectively, show a complex inner life through their mistrust and are therefore considered characters. The second property indicates the degree of animacy of a corresponding entity in the real world. Animate entities can be annotated with the values “human,” “animal,” “supernatural,” or “inanimate.” This distinction between perception within the fictional world and the world 1035 knowledge applied during reception is important. Although entities in narratives do not form a direct reference to the real world due to their fictional nature, readers derive many features from their world knowledge about the corresponding real-world entity. A borderline case in this categorization is the description of body parts. The head of a horse (cf. KHM 89 The Goose Girl) could be classified as animal and inanimate. Here, it is argued that the category animal implies a form of animacy. Parts of a dead animal would be perceived as inanimate in everyday life and are therefore categorized as such here. C. Online Resources Data and code can be found here: https://github.com/forTEXT/animacy_in_german_folktales. 1036