=Paper=
{{Paper
|id=Vol-3834/paper90
|storemode=property
|title=Animacy in German Folktales
|pdfUrl=https://ceur-ws.org/Vol-3834/paper90.pdf
|volume=Vol-3834
|authors=Julian Häußler,Janis von Keitz,Evelyn Gius
|dblpUrl=https://dblp.org/rec/conf/chr/HausslerKG24
}}
==Animacy in German Folktales==
Animacy in German Folktales
Julian Häußler1,∗,† , Janis von Keitz1,† and Evelyn Gius1
1
fortext lab, Technical University of Darmstadt, Germany
Abstract
This paper explores the phenomenon of animacy in prose by the example of German folktales. We
present a manually annotated corpus of 19 German folktales from the Brothers Grimm collection and
train a classifier on these annotations. Building on previous work in animacy detection, we evaluate
the classifier’s performance and its application to a larger corpus. The findings highlight the complex-
ity of animacy in literary texts, distinguishing it from named entity recognition and emphasizing the
classifier’s potential for enhancing character recognition in narratives.
Keywords
animacy, animacy classification, folktales, Computational Literary Studies
1. Introduction
[A]nd when any one attacked him he
would say, “Stick, out of the sack!” and
directly out jumped the stick, and dealt a
shower of blows on the coat or jerkin,
and the back beneath, which quickly
ended the affair.
The Table, the Ass, and the Stick
Household Stories by the Brothers Grimm
translated by Lucy Crane [6]
Folktales feature not only humans, but also talking animals as well as living objects. For exam-
ple, in the folktale The Table, the Ass, and the Stick a speaking donkey sends out three brothers
into the world by a trick and one of the brothers acquires a command executing stick. The don-
key and the stick are animate entities which break with the rules of common world knowledge.
These animate entities are positioned between simple objects or animals and human characters
and contribute to key plot points. Besides its obvious relevance in folktales, animacy is also
relevant in contexts such as in the romanticist understanding of nature [1], the depiction of
machines [5] or in present day discourse around artificial intelligence. It is thus connected to
concepts such as agency and it is closely connected to characters in fiction.
CHR 2024: Computational Humanities Research Conference, December 4 – 6, 2024, Aarhus, Denmark
∗
Corresponding author.
†
These authors contributed equally.
£ julian.haeussler@tu-darmstadt.de (J. Häußler); janis.von_keitz@tu-darmstadt.de (J. v. Keitz);
evelyn.gius@tu-darmstadt.de (E. Gius)
ȉ 0000-0001-7490-8570 (J. Häußler); 0009-0002-9760-3600 (J. v. Keitz); 0000-0001-8888-8419 (E. Gius)
© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
1023
CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
The goal of this paper is to scrutinize animacy in German folktales as a phenomenon of
interest for Computational Literary Studies (CLS) by showcasing the manual annotation of
animacy and presenting a classifier trained on these annotations. The overall approach builds
on the work of Karsdorp et al. [9] who developed an approach to animacy detection in Dutch
folktales. In the following we discuss previous work on animacy (section 2), present our corpus
of German folktales from the Children’s and Household Tales by the brothers Grimm as well as
our understanding of animacy and its manual annotation (section 3). Furthermore, we evaluate
its relation to the neighboring concepts of fictional characters and named entity recognition
(section 4). We then reproduce Karsdorp et al.’s approach for German folktales, evaluate the
results of our classification and apply them to a larger corpus (section 5). We close summarizing
our findings and sketching possible directions for future work (section 6).
2. Animacy in Text-based Research
The concept of animacy is crucial in human perception for distinguishing between living and
non-living entities. Animacy perception, the ability of which is developed in early childhood
and might be innate [9], is based on unpredictability of biological life but also on agency, in-
fluenced by movements and mental states [20]. This perception extends to reading texts. In
fiction, the recognition of animacy is influenced by the narrative context, allowing fictional
entities to be perceived as animate even if they aren’t in real life [12].
In text based research, animacy has been introduced as a grammatical category by Michael
Silverstein in the 1970s. He suggested a hierarchy for describing languages on a general level,
ranking grammatical phenomena according to animacy: 1st person > 2nd person > 3rd per-
son/deictics > human NPs > animate NPs > inanimate NPs [4]. This hierarchy influences gram-
matical structures in many languages, affecting aspects such as inflection and word order. For
example, in German as well as in English and other languages, animacy influences the choice of
interrogative pronouns and the use of certain verbs (e.g., “schauen” /“to look” requires animacy
of the semantic subject). Despite this systematic hierarchy, animacy isn’t a simple linear scale.
It is influenced by additional parameters, including the perception of empathy and sensation.
Objects like computers or organizations are sometimes considered animate due to attributed
intelligence or agency, complicating the distinction between animate and inanimate [21].
In literary studies, a definitive concept of animacy has yet to be established. Still, it can be ex-
plored through stylistic devices like personification and anthropomorphism, as well as through
the characterization of fictional characters. Personification assigns human attributes to non-
human entities, while anthropomorphism extends this by giving them human-like forms and
mental attributes. Additionally, narratological theories examine how characters, including po-
tentially inanimate ones, are constructed and perceived, highlighting the complexity of char-
acter portrayal through textual indicators, language use, and readers’ cognitive engagement
[8].
In NLP and adjacent fields, the binary classification of entities in texts into animate and inan-
imate entities is particularly relevant. Animacy classification aids numerous NLP tasks such as
anaphora or coreference resolution, dependency parsing, word sense disambiguation, semantic
role labeling, as well as automatic text generation and translation [7, 9]. Determining whether
1024
a pronoun refers to an animate or inanimate antecedent significantly simplifies anaphora and
coreference resolution in many languages. Since animacy also influences grammatical struc-
tures in many languages, it also affects dependency parsing and semantic role labeling. In
automatic text generation, taking into account the animacy required by verbs is essential for
generating semantically correct sentences.
In Computational Literary Studies, animacy classification can help identify characters in nar-
ratives [7, 16]. However, fictional worlds in literature can challenge traditional animacy clas-
sification, as objects or plants may act as agents, diverging from real-world knowledge. Rule-
based systems with semantic lexicons like WordNet might misclassify such entities. Therefore,
animacy classification in narrative texts should build on contextual understanding rather than
fixed rules [9]. Hybrid systems combining machine learning with rule-based methods show
promise in addressing these challenges .[7] used a hybrid system combining a support vector
machine classifier with a rule based classification system and achieved an 𝐹1 of 0.88 for clas-
sifying animacy. [9] tried using a, as they call it, linguistically uninformed model with word
embeddings and achieved an 𝐹1 of 0.91 for the animate class.
3. Data
3.1. Corpus
Our approach is based on a corpus of 19 German folktales (see Appendix A). These were
selected from the Brother Grimm’s Children’s and Household Tales (Kinder- und Hausmärchen),
a collection of folktales published from 1812 onwards [2, 3]. The texts were collected from
Wikisource, where all editions of the collection are available digitally.
For selecting texts we reviewed all 201 tales and 10 children’s legends for entities that are
depicted as animate but cannot be categorized as humans or animals in everyday terms. We
excluded the cases differing significantly in meaning and function from inanimate entities that
are animate and have a tangible counterpart in the real world. Meaning, texts containing su-
pernatural phenomena, such as the anthropomorphization of divine beings, the personification
of events like death, or metaphorical descriptions in which animacy is used as a stylistic were
not included. Moreover, humans or animals transformed into inanimate entities within the
fictional world were not considered if they exclusively displayed inanimate qualities. Magical
entities were examined case-by-case, as these often represent borderline cases of animacy de-
piction. Since depiction of animacy is strongly related to independent action, texts where this
action is explicitly described for magical entities were included.1
3.2. Manual Animacy Annotations
We annotated the 19 folktales in our corpus with regard to animacy. The full annotation guide-
line is available in Appendix B. Our animacy concept is connected to coreference annotation,
as we not only annotate (proper) nouns but also mentions of animacy. However, our approach
1
Our approach therefore differs from the principles in the Aarne–Thompson–Uther Index [19] and the Motif-Index
of Folk-Literature [18]. The ATU classifies animals tales but disregards animacy and the Motif-Index bases the
classification of animals with human traits only on speech or role, but not on agency.
1025
differs from the one by [7] in the way that they use pre-annotated coreference chains, anno-
tating animacy in nouns, gendered pronouns and adjectives. It also differs from [9] as their
animacy concept is based on the rationality and intentionality of an entity, whereas we base
our animacy understanding on agency and speech. However, like [9] we use untagged data.
We consider an entity animate if one of the three conditions is met:
1. The entity performs an action independently and fulfills the agent role of a verb.
2. The entity makes independent verbal utterances.
3. The entity is described by a lexeme that refers to a living being, irrespective of its role
or actions in the sentence. Unless, an additional description explicitly excludes animacy
(e.g., a dead relative).
In order to have an overview of entities in the text and to be able to relate to each entity,
for every animate entity one mention was annotated as recognizable mention (rm) in the first
iteration of annotation. In the second iteration all other expressions referring to animate enti-
ties were marked as animate. The referring expressions include proper names, descriptions by
attributes (such as profession, gender, appearance, or social status), and pronouns and can be
single or multiple token occurrences.
A second annotator has annotated KHM 6 and 10, resulting in an average Cohen’s kappa of
0.87.2 Disagreement stems mostly from the second annotator tending to oversee several articles
as well as possessive and reflexive pronouns, while also tending to annotate shorter spans (e.g.
only “goldsmiths” instead of “the goldsmiths of the empire”). However the first annotator
also overlooked several personal pronouns, we decided therefore to make the annotation of all
relevant pronouns more explicit in the guidelines.
4. Animacy and Related Concepts
4.1. Animacy and Literary Characters
In order to investigate the relationship between animate entities and characters we performed
an additional annotation of characters (cf. the third iteration in the guidelines in Appendix B).
Additionally we further categorized the entities according to their degree of animacy, ranking
from human and animal to inanimate, and supernatural (cf. Table 1). A closer look at the data
shows that animated entities appear more frequently as characters in fairy tales, with humans
making up more than the half of the characters and often serving as protagonists even in our
selection of folktales which is skewed towards non-human animate entities. Animals are por-
trayed as characters when they are humanized, transformed into humans, or perform certain
functions, while animals that are not characters are often tamed, play a secondary role in an-
imal stories, or serve a single function. While inanimate objects appear less frequently, they
often become characters when animated for narrative purposes, emphasizing the intentional
use of animated inanimate objects in the stories.
2
In order to calculate the inter-annotator-agreement we assigned animate/inanimate tags to each token, splitting
multiword expressions. If one annotator annotated “trusty John” and the other annotator annotated only “John”,
the name gets counted as a match while the adjective doesn’t.
1026
Table 1
Animacy and characters: Share of human, animal, inanimate and supernatural tokens in manual anno-
tation of characters (occurrences and percentage).
animacy type
human animal inanimate supernatural total
character 84 (53.5%) 32 (20.4%) 41 (26.1%) 0 (0%) 157 (100%)
not character 31 (38.3%) 35 (43.2%) 12 (14.8%) 3 (3.7%) 81 (100%)
total 115 (48.3%) 67 (28.2%) 53 (22.3%) 2 (0.8%) 238 (100%)
4.2. Animacy and Named Entities
We now look into named entity recognition which is currently the default approach to character
analysis in CLS. The analysis of the results of NER with Stanza [14] and our manual animacy
annotation reveals a disparity between entities recognized by NER and those annotated as
animate, with only 193 tokens overlapping (cf. Table 2). In terms of distribution, 106 tokens
are exclusively named entities, mostly involving mere mentions of names without action, while
5,588 cases are exclusively animate.
The scarcity of entity annotations for animate entities can primarily be attributed to the
NER approach in which pronouns and articles are not considered entities. But there are also
errors in the NER in which some proper names and appellatives where not identified properly
as named entities.
Among the identified named entity types are 173 animate and 106 inanimate PER tokens,
and 20 animate (and 0 inanimate) LOC tokens. Entities that were are annotated as animate and
as named entities include diminutive forms, professions, kinship terms, and celestial bodies
like “Sonne” (sun) and “Mond” (moon).
Next to these correctly identified cases there are several missed mentions. The cases already
mentioned as well as other animacy mentions are only inconsistently recognized as named enti-
ties for recurrent occurrence. For instance, “Besenchen,” (diminutive of broom) “Bohne,” (bean)
and “Drechsler” (wood turner) are recognized as S-PER (single-token person entities) only in
some of their occurrences within the same text, while others like “Gänsemagd” (goose maid)
and “Fuchs” (fox) show varied recognition across different texts. Instances of “Berg Semsi”
(semsi mountain) are consistently annotated as animate but only recognized four times as LOC
and one time as PER out of ten cases. Also, unique tokens such as “Söhnlein” (diminutive of
son) and archaic forms like “Thier” (animal) are noted for their inconsistent recognition.
The observation that named entity recognition (NER) does not fully encompass animacy
detection suggests that, even when disregarding NER errors3 , animacy is a more effective cri-
terion for character detection (cf. animacy scores in Table 3).
3
We additionally annotated PER entities in six folktales (KHM 6, 10, 11, 18, 24, 28). Stanza NER classification only
reached on F1 score of 0.7 (P: 0.63, R: 0.78) for these which is a considerably worse performance than animacy
detection.
1027
Table 2
Share of named entities (with Stanza) in the manually annotated animate entities.
animate inanimate
named entity 193 106
no named entity 5,588 n/a
5. Animacy Classification
5.1. Implementation of the Classifier
In examining their annotated data, Karsdorp et al. observe that the part of speech of a word is
already a sort of ’weak’ indicator for a word to be animate, as 40% of tokens they annotated as
animate are nouns or proper nouns, while only 11% of tokens tagged as inanimate are nouns
[9]. A finding we can confirm at least in part, as 26.5% of our tokens annotated as animate
are nouns or proper nouns and 5.7% of tokens tagged as inanimate are nouns or proper nouns.
[9] build on this observation by not only training their classifier on the manually annotated
data but also adding various linguistic features in order to find a best performing combination
for the training input. They run several experiments where they always include the manually
annotated tokens in a rolling context window of three token to the left and right (which they call
the lexical input). They subsequently combine this base data with the rolling context window
of the lemma, the part-of-speech tags (i.e. morphological features), the dependency tags (i.e.
syntactic features) and the embedding vector of the target token taken from a Word2Vec model
built on a web corpus (i.e. semantic features). We reproduced their way of creating lexical,
morphological and syntactic features using the Stanza library [14]. We furthermore trained
a Word2Vec model which we deem comparable to the literary language of the time, trained
on 115 novels from the German Romantic era [17]. This Word2Vec model was trained using
Gensim [15], which is based on [11]. We used the same parameters as [9], who use the skip-
gram architecture with a vector size of 300 (the other parameters were set to default).4
5.2. Evaluation
For evaluating the results we calculated the F1-scores using 10-fold cross validation, differenti-
ating between the much larger class of inanimate and the class of animate entities (cf. Table 3).
While we reached lower F1-scores, our results are comparable to the results of [9] with re-
gard to the combination of lexical features (tokens), part-of-speech tags and embedding vector
yielding the best result (F1-score of the Dutch classifier for the animate class of 0.93).
Furthermore, we experimented with adding more annotated data to see if the performance
of the classifier plateaus at a certain point. For this, we annotated six additional KHM folktales.
Subsequently, we incrementally expanded the data for the classifier with all features by adding
one of these fairy tales at a time and conducted a 10-fold cross-validation to observe the evolu-
4
With this we have successfully reproduced the workflow by [9] concerning lexical and the combination of lexical
and semantic features. However, we have not yet determined how to incorporate additional features into this
training process. We used the same classification algorithm (Maximum Entropy, as implemented in scikit-learn,
[13]).
1028
Table 3
Evaluation of classification for animate and inanimate tokens (10-fold cross validation).
inanimate animate
P R F1 P R F1
lexical features 0.9512 0.9764 0.9636 0.8776 0.7718 0.8212
all features 0.9596 0.9726 0.9661 0.8671 0.8137 0.8395
Table 4
F1 Scores for the classification of animate tokens during incremental data expansion.
Added tale 𝐹1 Score
base case (19 atypical animacy folktales) 0.8395
+ KHM 1 The Frog King, or Iron Heinrich 0.8329
+ KHM 2 Cat and Mouse in Partnership 0.8330
+ KHM 3 Mary’s Child 0.8326
+ KHM 4 The Story of the Youth Who Went Forth to Learn What Fear Was 0.8295
+ KHM 5 The Wolf and the Seven Young Kids 0.8296
+ KHM 7 The Good Bargain 0.8296
Figure 1: Relative frequency of animate entities of the 211 Children’s and Household Tales.
tion of the F1 score of the animate class. It was found that the score did not increase; rather, it
tended to decrease slightly (cf. 4).
5.3. Implementation in German Folktales
The application of our the classifier to the entire corpus of 211 Children’s and Household Tales
yields the results shown in Figure 1. The average proportion of animated tokens is 16%. The
20 texts with a proportion of <=10% (bottom outliers) consistently are not written in standard
German.
A spot-check of the annotations indicates reasonably good results. The classifier even dis-
cerns correctly between mere name references in direct speech and mentions of animate enti-
1029
ties. For example, the main character of the eponymous tale KHM 55 Rumpelstiltskin is anno-
tated as animate when referred to as “Männchen” (little man), whereas the two proper name
mentions used only as a name reference in direct speech are classified correctly as inanimate. In
KHM 34 Clever Elsie the character is not classified as animate in direct speech but is recognized
as such in narrative parts and in KHM 166 Strong Hans the character “Hans” is consistently rec-
ognized correctly as animate. However, the classifier struggles with correctly detecting rare
and complex tokens. For example, the more common “Fuchs” (fox) is recognized more reliably
than the more complex form “Rothfuchs” (red fox) in KHM 73 The Wolf and the Fox. On the
other hand, the animate “Vogel” (bird) in contrast to the inanimate “Vogelherz” (bird heart) in
KHM 122 Donkey Cabbages are classified both correctly.
6. Discussion and Outlook
Our approach to animacy classification achieves a reasonably good detection of animacy and its
application to a corpus of German folktales provides some interesting insights. With regard to
the assumed relation between named entities and animacy, we have shown only partial overlap
both for the concepts and for their detection. In other words, our results indicate that we did
not simply achieve NER in place of animacy detection. These outcomes demonstrate that our
animacy approach is distinct from NER. In fact, correlation between the relative frequency of
person entities (automatically tagged using [14]) and the relative frequency of animate entities
(using our classifier) in the Grimm corpus is rather low, with a Pearson correlation coefÏcient
of -0.132 and Spearman’s correlation coefÏcient of -0.195. The classifier adheres to our animacy
framework both with regard to animacy and inanimacy. Furthermore, the classifier addresses
a gap in detecting animated animals and objects. This capability suggests that with further
development, it could also enhance character recognition.
Accordingly, future work should explore the combination with NER and coreference resolu-
tion for the identification of characters as well as the potential of LLMs for the annotation of
animacy. From the perspective of analysis, also sorting out human entities would be an inter-
esting future step, allowing to analyze animate objects, animals, and other potential candidates
for characters not displaying features of person entities.
References
[1] R. Borgards, F. Middelhoff, and B. Thums, eds. Romantische Ökologien: Vielfältige Naturen
um 1800. Vol. 4. Neue Romantikforschung. Berlin, Heidelberg: Springer, 2023. doi: 10.10
07/978-3-662-67186-3.
[2] Brüder Grimm. Kinder und Hausmärchen: Band 1. 7th ed. Göttingen: Verlag der Dieterich-
schen Buchhandlung, 1857. url: https://de.wikisource.org/wiki/Kinder-%5C%5Fund%5
C%5FHaus-M%5C%C3%5C%A4rchen%5C%5FBand%5C%5F1%5C%5F(1857).
[3] Brüder Grimm. Kinder und Hausmärchen: Band 2. 7th ed. Göttingen: Verlag der Dieterich-
schen Buchhandlung, 1857. url: https://de.wikisource.org/wiki/Kinder-%5C%5Fund%5
C%5FHaus-M%5C%C3%5C%A4rchen%5C%5FBand%5C%5F2%5C%5F(1857).
1030
[4] H. Bußmann, ed. Lexikon der Sprachwissenschaft. 4th ed. Stuttgart: Alfred Kröner Verlag,
2008.
[5] M. Coll Ardanuy, F. Nanni, K. Beelen, K. Hosseini, R. Ahnert, J. Lawrence, K. McDonough,
G. Tolfo, D. C. Wilson, and B. McGillivray. “Living Machines: A study of atypical ani-
macy”. In: Proceedings of the 28th International Conference on Computational Linguistics.
Ed. by D. Scott, N. Bel, and C. Zong. Barcelona, Spain (Online): International Committee
on Computational Linguistics, 2020, pp. 4534–4545. doi: 10.18653/v1/2020.coling-main
.400.
[6] “The Table, the Ass, and the Stick”. In: Household Stories, illustrated by Walter Crane,
translated by Lucy Crane. Ed. by J. Grimm and W. Grimm. Trans. by L. Crane. 1882. url:
https://en.wikisource.org/wiki/Household%5C%5Fstories%5C%5Ffrom%5C%5Fthe%5
C%5Fcollection%5C%5Fof%5C%5Fthe%5C%5FBros%5C%5FGrimm%5C%5F(L%5C%5F%5
C%26%5C%5FW%5C%5FCrane)/The%5C%5FTable,%5C%5Fthe%5C%5FAss,%5C%5Fand
%5C%5Fthe%5C%5FStick.
[7] L. Jahan, G. Chauhan, and M. Finlayson. “A New Approach to Animacy Detection”. In:
Proceedings of the 27th International Conference on Computational Linguistics. Ed. by E. M.
Bender, L. Derczynski, and P. Isabelle. Santa Fe, New Mexico, USA: Association for Com-
putational Linguistics, 2018, pp. 1–12.
[8] F. Jannidis. Figur und Person. Beitrag zu einer historischen Narratologie. Berlin: de Gruyter,
2004.
[9] F. Karsdorp, M. van der Meulen, T. Meder, and A. van den Bosch. “Animacy Detection in
Stories”. In: 6th Workshop on Computational Models of Narrative (CMN 2015). Ed. by M. A.
Finlayson, B. Miller, A. Lieto, and R. Ronfard. Vol. 45. Open Access Series in Informatics
(OASIcs). Dagstuhl, Germany: Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2015,
pp. 82–97. doi: 10.4230/OASIcs.CMN.2015.82.
[10] S. Lahn and J. C. Meister. Einführung in die Erzähltextanalyse. Stuttgart: Metzler, 2016.
[11] T. Mikolov, K. Chen, G. Corrado, and J. Dean. EfÏcient Estimation of Word Representations
in Vector Space. 2013. doi: 10.48550/arXiv.1301.3781.
[12] M. S. Nieuwland and J. J. A. van Berkum. “When peanuts fall in love: N400 evidence for
the power of discourse”. In: Journal of cognitive neuroscience 18.7 (2006), pp. 1098–1111.
doi: 10.1162/jocn.2006.18.7.1098.
[13] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P.
Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher,
M. Perrot, and E. Duchesnay. “Scikit-learn: Machine Learning in Python”. In: Journal of
Machine Learning Research 12.85 (2011), pp. 2825–2830.
[14] P. Qi, Y. Zhang, Y. Zhang, J. Bolton, and C. D. Manning. “Stanza: A Python Natural
Language Processing Toolkit for Many Human Languages”. In: Proceedings of the 58th
Annual Meeting of the Association for Computational Linguistics: System Demonstrations.
Ed. by A. Celikyilmaz and T.-H. Wen. Online: Association for Computational Linguistics,
2020, pp. 101–108. doi: 10.18653/v1/2020.acl-demos.14.
1031
[15] R. Řehůřek and P. Sojka. “Software Framework for Topic Modelling with Large Corpora”.
In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks. Val-
letta, Malta: Elra, 2010, pp. 45–50.
[16] D. Schmidt, A. Zehe, J. Lorenzen, L. Sergel, S. Düker, M. Krug, and F. Puppe. “The
FairyNet Corpus - Character Networks for German Fairy Tales”. In: Proceedings of the
5th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sci-
ences, Humanities and Literature. Ed. by S. Degaetano-Ortlieb, A. Kazantseva, N. Reiter,
and S. Szpakowicz. Punta Cana, Dominican Republic (online): Association for Computa-
tional Linguistics, 2021, pp. 49–56. doi: 10.18653/v1/2021.latechclfl-1.6.
[17] M. Schumacher, I. Uglanova, and E. Gius. d-Romane-Romantik (d-RoRo). 2022. doi: 10.52
81/zenodo.7215170.
[18] S.-1. Thompson. Motif-index of folk-literature: a classification of narrative elements in folk-
tales, ballads, myths, fables, mediæval romances, exempla, fabliaux, jest-books, and local
legends : A - C. Vol. 1. A - C. Indiana University studies ; Vol. 19, No. 96/97. Bloomington,
Ind.: Univ. Libr., 1934.
[19] H.-J. [ Uther. The types of international folktales: a classification and bibliography; based
on the system of Antti Aarne and Stith Thompson. FF Communications. Helsinki: Suoma-
lainen Tiedeakatemia, 2004.
[20] M. Westfall. “Perceiving agency”. In: Mind & Language 38.3 (2023), pp. 847–865. doi:
10.1111/mila.12399.
[21] M. Yamamoto. Animacy and reference: A cognitive approach to corpus linguistics. Vol. Vol.
46. Studies in language Companion series, SLCS. Amsterdam and Philadelphia, PA: John
Benjamins Publ, 1999. doi: 10.1075/slcs.46.
1032
A. Corpus
• KHM 6 Trusty John
• KHM 10 The Pack of RagamufÏns
• KHM 11 Brother and Sister
• KHM 18 The Straw, the Coal, and the Bean
• KHM 24 Mother Holle
• KHM 28 The Singing Bone
• KHM 30 The Louse and The Flea
• KHM 36 The Table, the Ass, and the Stick
• KHM 41 Herr Korbes
• KHM 42 The Godfather
• KHM 49 The Six Swans
• KHM 56 Sweetheart Roland
• KHM 80 The Cock and the Hen
• KHM 88 The Singing, Springing Lark
• KHM 89 The Goose Girl
• KHM 103 Sweet Porridge
• KHM 142 Open Sesame
• KHM 171 The Willow-Worn
• KHM 188 Spindle, Shuttle, and Needle
B. Annotation Guidelines
The annotation process is designed for using the CATMA platform and proceeds through three
iterations. It is based on the understanding of mentions from coreference chains. The annota-
tion span ranges from single tokens to multi word expressions.
Iteration 1: Overview of Entities
In the first iteration, the goal is to provide an overview of all animate entities and to be able to
relate to each entity. Each entity recognized as animate is marked with a clearly identifiable
mention in the text that we call recognizable mention (rm). The mention does not necessarily
need to be the first occurrence of the entity; rather, it should be one that allows for quick
identification. An entity qualifies as animate if at least one of the first three criteria is met:
1. The entity performs an independent action explicitly described in the text, occupying
the agent role of a verb.
2. The entity makes independent verbal expressions.
3. The entity is described by a lexeme that refers to a living being, irrespective of its role
or actions in the sentence. Unless, an additional description explicitly excludes animacy
(e.g., a dead relative).
1033
For example in the sentence “directly jumped out the stick, and dealt a shower of blows on
the coat or jerkin, and the back beneath, which quickly ended the affair” (KHM 36 The Table, the
Ass and the Stick, [6]) the stick’s agent role is evident. Therefore, it meets the first criterion and
is annotated as animate. Whereas in the sentence “When placed and spoken to, ‘Little table,
set yourself,’ it would immediately be covered with a clean cloth, with plates, knives, and forks
beside it” (KHM 36 The Table, the Ass and the Stick) an independent action is implied, although
it is not explicitly depicted. The little table does not occupy an agent role and is therefore not
marked as animate.
As an example for the second criterion we look at the sentence “but the bread called out,
‘Oh, take me out, take me out, or I’ll burn; I’ve been done for a long time.’” (KHM 24 Mother
Holle). The bread is the originator of an independent verbal statement and is hence marked as
animate.
The third criterion can be observed in the description “After lifting the girl onto his horse,
the old woman showed him the way” (KHM 49 The Six Swans). Although the horse is not in an
agent position here, readers’ world knowledge recognizes a horse as an animate entity, so it is
marked as animate. This extends to entities that are not characters, such as relatives mentioned
but not directly appearing in the text, which are also annotated.
Furthermore, a new recognizable mention gets annotated for entities that are transformed
radically, where the transformed entity also satisfies one of the conditions explained above. E.g.
in KHM6 Trusty John the title character gets transformed into a speaking stone.
If multiple entities get introduced as a group (“three ravens”), the first mention of the group
gets annotated instead of single first mentions of each member of the group.
To further clarify the rules, some borderline cases are discussed in the following. In some
fairy tales, the narrator appears through a first-person reference and the reader is also refer-
enced.
• “eagle and finch, owl and crow, lark and sparrow, what should I call them all?”
(KHM 171 The Wren)
• “and the donkey didn’t stop until everyone had so much that they couldn’t carry anymore.
(I can see it in your face, you would have liked to be there too.)”
(KHM 36 The Table, the Ass, and the Stick)
The narrator and the recipient are regarded here as textual constructs with no real-world
counterpart [10, p. 61]. As a result, their reference expressions cannot be assigned a definite
degree of animacy.
Common borderline cases are magical objects that appear in fairy tales. Rule (1) has already
clarified that explicit independent action is a prerequisite for the animacy annotation. However,
cases occur where classification is still ambiguous.
• “the way was so hard to find that he would not have found it if a wise woman had not
given him a ball of yarn; when he threw it in front of him, it unwound by itself and
showed him the way.”
(KHM 49 The Six Swans)
“Now she could not rest until she found out where the king kept the ball of yarn”
(KHM 49 The Six Swans)
1034
The ball of yarn in the first excerpt clearly occupies the agent role of the verbs “unwind” and
“show.” In the second quote, however, it is used with the verb “keep,” which typically requires
an inanimate object. Based on this case, it was decided that a single animate occurrence is
sufÏcient for marking the entity as animate.
Iteration 2: Annotation of all Mentions
In the second iteration, all reference expressions referring to entities marked as animate in the
previous iteration were annotated. Reference expressions include all noun phrases containing
proper names including the article, descriptors based on attributes such as occupation (“the
brave little tailor”), gender, appearance (“the beautiful one”), or social status (“the poor man,”
“the princess”), as well as personal, demonstrative, relative, possessive, or indefinite pronouns.
Additionally, all expressions referring to such noun phrases through any reference type were
included. This annotation level provides an overview of where and how often animate entities
appear. Borderline cases in the annotation process include vague references (“everyman”), re-
flexive verb constructions (“he withdrew himself”), or entities recognized as animate only later
in the story. Vague references and reflexive pronomina in reflexive verb constructions are not
annotated animate because they do not refer to any specific animate entity. Entities that ap-
pear as inanimate but can be recognized animate over the course of the story are consistently
marked as animate.
Iteration 3: Annotation of Character Status and Animacy Degree
In the third and final iteration, the existing annotations are enriched with the properties ’char-
acter’ and ’degree of animacy’. The former indicates if the animate entity is a character or not.
The latter marks it as human, animal, object or supernatural.
An entity is marked as a character if the description in the text includes some form of the
semantic feature “human” (“the king”). Another indicator is the association with verbs describ-
ing typically human actions. Entities that are sources of verbal expressions (“the lion spoke”)
or exhibit a complex inner life or thinking are also marked. Some borderline cases include
groups of people.
• “The king summoned all goldsmiths, who had to work day and night”
(KHM 6 Trusty John)
• “Then the other servants of the king, who did not favor Faithful John, shouted, ‘How
shameful to kill the beautiful animal that was to carry the king to his castle!’ ”
(KHM 6 Trusty John)
Individual cases must be distinguished. The term “goldsmiths” can be semantically associ-
ated with the occupation of a human, but no individuals are visible in this description, so they
do not appear as characters. The servants, even collectively, show a complex inner life through
their mistrust and are therefore considered characters.
The second property indicates the degree of animacy of a corresponding entity in the real
world. Animate entities can be annotated with the values “human,” “animal,” “supernatural,”
or “inanimate.” This distinction between perception within the fictional world and the world
1035
knowledge applied during reception is important. Although entities in narratives do not form
a direct reference to the real world due to their fictional nature, readers derive many features
from their world knowledge about the corresponding real-world entity. A borderline case in
this categorization is the description of body parts. The head of a horse (cf. KHM 89 The Goose
Girl) could be classified as animal and inanimate. Here, it is argued that the category animal
implies a form of animacy. Parts of a dead animal would be perceived as inanimate in everyday
life and are therefore categorized as such here.
C. Online Resources
Data and code can be found here: https://github.com/forTEXT/animacy_in_german_folktales.
1036