=Paper=
{{Paper
|id=Vol-3290/long_paper2313
|storemode=property
|title=Modeling Plots of Narrative Texts as Temporal
Graphs
|pdfUrl=https://ceur-ws.org/Vol-3290/long_paper2313.pdf
|volume=Vol-3290
|authors=Leonard Konle,Fotis Jannidis
|dblpUrl=https://dblp.org/rec/conf/chr/KonleJ22
}}
==Modeling Plots of Narrative Texts as Temporal
Graphs==
Modeling Plots of Narrative Texts as Temporal Graphs Leonard Konle, Fotis Jannidis Julius-Maximilians-Universität Würzburg, Würzburg, Germany Abstract The paper outlines a formal model of plot (and syuzhet) for narrative texts. The basic unit are scenes and the motif repertoire instantiated in the scene. The motif repertoire consists of three sets of (closely related) elements: character stereotypes, types of verbal actions and action types. It is assumed that the motif repertoire is highly dependent on the corpus which is analyzed, in our case a corpus of romance and horror novels published as pulp 昀椀ction. The resulting information is represented in a temporal graph which in turn is used to compute relevant information on the scenes and characters. Scenes are also characterized by their valence and their arousal value. A second representation which o昀昀ers with a topic model of the direct speech and the narrative text a simple proxy for the types of verbal actions and the action types is also created. To assess the ability of these information structures to indicate changes in the temporal structures three evaluation methods are used based on arti昀椀cial data. We can con昀椀rm that a very abstract representation of the plot is able to do so, but contrary to our expectations the more information-rich model which makes use of the topic model is not better in doing so. The main contribution of this paper is its attempt to integrate di昀昀erent research proposals into one integral model. We o昀昀er a descriptive framework and a proposal for the formal model of plot, which makes it possible to identify research problems and align existing approaches. Keywords plot, temporal graphs, scenes, characters, modeling 1. Introduction For our understanding of narrative literature character and plot are basic and central categories. Though computational literary studies already have a rich landscape of character models, it is not yet as advanced when it comes to analyzing plot. The main reason is the complexity not only of a generic model of plot, but of the subproblems involved. Most of the contributions to the discussion of plot and event in recent years have tried to map the myriad of elements which can be found in plot descriptions to one or a very small set of textual phenomena. [14] uses sentiment values as indicators for plot 昀氀uctuation, [3] map from di昀昀erent groups of function words to three concepts: staging, plot progression and cognitive tension, [29] classify verbs to four types of event (changes of state, process events, stative events and non-events).1 Al- ternatively [27] basically do without any abstraction and map almost each verb to itself. Our main goal in this paper is to discuss the outline of a model which could o昀昀er a more complex CHR 2022: Computational Humanities Research Conference, December 12 – 14, 2022, Antwerp, Belgium £ leonard.konle@uni-wuerzburg.de (L. Konle); fotis.jannidis@uni-wuerzburg.de (F. Jannidis) ȉ 0000-0001-5833-0414 (L. Konle); 0000-0001-6944-6113 (F. Jannidis) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop CEUR Workshop Proceedings (CEUR-WS.org) Proceedings http://ceur-ws.org ISSN 1613-0073 1 A detailed presentation of earlier computational research on plot can be found in Elsner [7]. 318 representation than those mentioned, and to delineate what kind of problems the CLS commu- nity has to solve to reach this point. All in all, this paper is more of a modeling study with some attempts at implementation and evaluation than a typical CLS paper concentrating on the details of a speci昀椀c implementation. But in our view the discussion on how to model plot has reached a point in recent years where that, which is described as plot in CLS, has only a very vague resemblance with what people in literary studies and beyond mean, when they use the term. But it stands to reason that only a solid model of plot, nearer to this established use of the term, can be the basis for understanding genre systems, historical developments of literature and many other aspects of literary communication. It would be misleading to assert that there is one meaning of ‘plot’ in literary studies. The term has many layers of meaning, not the least because many di昀昀erent analytical traditions use this term in their English translations (details see [18]). We use the term here to refer to the structure constituted by the sequence of events. The term ‘event’ refers here as usual to the (inter-)actions of characters. But the term structure does not imply a reduction to some shape or outline, but rather a feature-rich representation which nevertheless can be abstract enough to recognize patterns and based on that similarity between texts. In the discussion of the term ‘plot’ very di昀昀erent levels of abstraction are used. But in our understanding only rarely is the term used in such an abstract way, that only the amount or intensity of action is measured, as some have interpreted Freytag’s famous 昀椀ve-stage model of plot [9]. Most uses of ‘plot’ include more concrete aspects of the actions depicted in the text. This is closer to what is represented in a summary which concentrates on the main plot points. The need for abstraction is con昀椀rmed by the narratological work on ‘event’. [20] (see also [12]) has shown that the concept is so encompassing, that basically anything can be an event for someone under speci昀椀c circumstances. In other words, a computational model of events in this understanding would need to encompass a complete model of the world. Add to this another observation, made 昀椀rst as far as we know by [15], that summaries of literary texts use more generalizations and abstractions compared to those of non-literary texts, which is also con昀椀rmed by our own work. In other words, the depiction of characters and events in literary texts is usually concrete, and a summary will compact this information using di昀昀erent terms including more generalizations and abstractions. This constellation is, so we believe, the main reason why real progress in this 昀椀eld has been stalled. So abstraction is necessary, but how much? In the 昀椀rst step, we use a model which is richer than most models for representing plot. Its components are chosen based on earlier research. We also start, as the model by Elsner [8], with the characters but we add three components: First, we model plot as a sequence of scenes in which characters and events are nested. So 昀椀rst, we segment the text into scenes. This is based on work on scene segmentation [30] and is similar to the proposal in [23] where the relation between plot elements is described as ’event-scene-level-plotline-plot’ (p.302). For each scene, we construct a character network. Second, recent work by linguists [5] and also work on speech rendition in narrative texts [4] has shown that a novel consists of speech rendition and of narration and these can be regarded as two text types or registers which have to be treated separately. Third, for each of these three components - character, speech rendition, narrative text - we de昀椀ne a very small set of generalizations and map the text to these. Based on the insight mentioned above, that almost everything can be an event in some texts, 319 we acknowledge that it is probably not possible to generate these generalizations independent from the corpus you wish to analyze. In other words, similar to Propp in his analysis of fairy tales [24], we do not de昀椀ne generalizations which are valid everywhere, but only for those texts we wish to analyze. On the other hand, even if we acknowledge this dependency on the text corpus, it is unclear on what level these generalizations should be established and how. Probably this can only be answered by taking the corpus into consideration. We are interested in analyzing pulp 昀椀ction (‘He昀琀romane’), literature written for entertainment, which is published in thin volumes on cheap paper. From this pulp 昀椀ction or dime novels we look at two genres, two, because this literature is from its very beginnings rather strictly binary gendered in relation to reader expectations by its publishers; in our selection, one of the genres - romance - is addressing women and the other - horror - is addressing men. Work on plot may include information about characters [7] and work on character stereo- types o昀琀en includes information about plot aspects. [1] for example include actions of which characters are agents or patients. [13] use Propp’s plot functions as features to cluster the characters and identify character roles. It seems rather obvious that these aspects, character stereotypes, and actions/events are closely related. We therefore propose to use the term ‘mo- tif repertoire’ for those character/plot/event elements which are typically present in a given corpus (usually a genre, a series etc.). As described above, we think it is useful for the anal- ysis of narrative texts to distinguish between plot / event elements in narrative text and in direct speech. Thus we have three classes in the motif repertoire of a corpus which closely interact: 1) character stereotypes / roles, 2) verbal actions (somewhat more concrete than the usual linguistic speech acts, for example ‘the [stereotype X] tells the [stereotype: heroine] that her [stereotype: beloved] wants to marry [stereotype: rival]’, 3) action types and events (‘the [stereotype: antagonist] attacks the [stereotype: hero]’). Establishing this motif reper- toire fully (or even fully automated) is a very hard problem and beyond the scope of this paper. In this paper we will discuss two models for these generalizations. Our 昀椀rst approach was to de昀椀ne these generalizations based on our reading experience. In a second approach, we abstracted less and kept more information of the speci昀椀c text corpus. The main contributions of our paper are a more detailed analysis of a content-rich plot model and the di昀케culties involved. In some important aspects it is indebted to [7], but adds more mod- ern ways to model temporality using temporal graphs and is based on scenes as basic units. We o昀昀er a descriptive framework and a proposal for the formal model of plot, which makes it pos- sible to identify research problems and we describe some ways to evaluate these models. So, it is not our goal to 昀椀nd a very sophisticated and highly performant implementation for a speci昀椀c task, but rather to investigate how a complex and feature-rich model of plot can be constructed and evaluated. In the practical parts of this paper, we rely, where possible, on existing tools and only add our own implementations where we need to 昀椀ll speci昀椀c gaps to reach our goal. These implementations are usually only simple place-holders for more sophisticated solutions to be found in the future. 320 1.1. Plot models for entertainment literature The basic outline of our modeling approach has three levels. On the basis we have single texts which belong to a corpus. The plot models we discuss are meant to represent simple literature written and read for the purpose of entertainment. In doing so we follow our belief, that the domain of literature is too heterogeneous and especially ‘high’ literature too complex to construct models in this early stages of research in Computational Literary Studies which can cover literature in general. So we start our research with highly formulaic literature published as pulp 昀椀ction on the German speaking markets, speci昀椀cally we work with two genres romance and horror. So even if we look at a speci昀椀c single text, we look at it through the lens of an information system based on the structure of the corpus the text comes from. The texts are segmented into scenes. Each scene can be represented abstractly as character stereotypes communicating and interacting. The character stereotypes, that is the kind of stereotype and also the elements of these stereotypes, are speci昀椀c for a corpus. The same is true for the types of communication and the types of action rendered in a scene. So while each speci昀椀c scene is represented abstractly, the elements of this abstraction are obtained through an analysis of the whole corpus – usually based on the genre –, the text belongs to. Figure 1 shows this basic outline. The speci昀椀c components of character stereotypes, types of communication and types of action and events, we chose here are very simpli昀椀ed, in our empirical studies, described below, we used slightly more complex representations.2 In our model the smallest Figure 1: The temporal graph, an abstraction of the scene-segmented text, consists of elements of the motif repertoire. 2 We do net deal with another important aspect which can be understood as additional part of the corpus-speci昀椀c motif repertoire and which is instantiated in each scene: setting and space in general. 321 segment is a scene. A “scene is a segment of the discours (presentation) of a narrative which presents a part of the histoire (connected events in the narrated world) such that (1) time is equal in discours and histoire, (2) place stays the same, (3) it centers around a particular action, and (4) the character constellation stays the same. All of these conditions are not absolute but rather relative, that is, small changes in either of them do not necessarily lead to a scene change but can rather be seen as indicators.” [30]. In a scene we can 昀椀nd characters and events. An event is usually the action of one or more characters, o昀琀en the interaction between them. Based on our understanding of plot as a chronologically and ideally causally ordered se- quence of events, we would now reorder the scenes accordingly. With the current state of the analytical tools in CLS, it is not feasible to do this automatically. Therefore we use the sequence as given by the text. At a later stage such a reordering could be added to the processing steps described below without any larger impact on the later steps. It is only necessary to remind oneself, that our model shows similarity between texts not on the histoire level alone, but on both levels: what happened and in which sequence was it narrated. In other words, we are not really talking about plot here, but rather about syuzhet, the plot as it is narrated. We also ignore the problem of narrative level, because none of the novels we read from these genres uses di昀昀erent narrative levels. Not all scenes are equally important. There are always scenes which would never be men- tioned in a summary while others are crucial. Even if the criteria for this weighting are hard to represent exactly, rough indicators like the level of valence and arousal could su昀케ce for the time being. In modeling the dime novels for our empirical research we tried to be as simple as possible: 1. Characters are described along three dimensions: main character vs. supporting char- acter, positive vs. negative, male vs. female3 . We considered using the actantial model proposed by Greimas [11] which in turn is an abstraction of the corpus-based classi昀椀- cation developed by Propp [24]. Greimas distinguishes between subject, object, helper, opponent, sender and receiver. But it seems to us that our dimensions allow us to capture the intuitions which are also the basis for Greimas. The 昀椀rst positive main character is usually the subject, while negative main characters are usually the opponents. Our ap- proach avoids the classi昀椀cation problems which usually arise especially from the last two concepts. 2. To determine the interaction types relevant for the description of events in entertainment literature is probably the most challenging aspect. We start with the simple fact that a high proportion of these texts consists of direct speech. Add to this reported and free indirect speech, and depiction of communication comprises around 40-50% of narrative texts, depending on the genre. So the 昀椀rst type of interaction is (usually verbal) commu- nication. In our 昀椀rst model, we don’t distinguish between di昀昀erent verbal actions like love declaration and death threat. In the second, we use a representation which covers some aspects without making it necessary to explicitly construct the motif repertoire ourselves. 3 The social construction of gender is a complex phenomenon, but entertainment literature usually simpli昀椀es this into a binary system; cf. the extensive discussion in Koolen [16]. 322 3. This leaves the narrative text which is not conveying information about communication, but about other types of events. From this, for the 昀椀rst approach we only use two cat- egories: The non-verbal expressions of positive a昀昀ection (especially erotic interaction) and of antagonistic action (昀椀ght) are typical interaction types for dime novels. Again, for the second approach we used a simple more content-rich representation without making it necessary to explicitly construct the motif repertoire ourselves. In short, scenes are identi昀椀ed and values for valence and arousal are computed for each scene. Then for each scene a character graph is constructed which represents the character dimen- sions and the interaction types. These scene graphs are then integrated into a temporal graph according to the sequence of scenes. The temporal graph allows to compute sequence-sensitive measures for characters which are added summarily to the scene (more complex representa- tions of these informations are thinkable, but it is not easy to integrate them into the represen- tation of a whole novel, see discussion). This information is complemented with information on the scene, valence and arousal and the averaged centrality measure for the characters involved in the scene (‘personal weight’). In a second approach we added to this general scene information the speci昀椀c distribution of topics for direct speech and for the narrative text to add more concrete information about the genre speci昀椀c interaction and event types. Using a topic model is a valid, but probably relatively crude way to construct a motif repertoire for the interaction and event types based on a corpus. This is one of the many points in this paper where we can only point to future research. 2. Corpus Figure 2: Corpus statistics. The corpus consists of 192 dime novels from the genres horror (39) and romance (153). The novels are relatively short with an average length of 39k tokens. Despite being longer than horror novels, romances show the same amount of scenes. This is due to shorter scenes in horror novels (Fig. 2). 323 3. Methods The next sections describe how to obtain the information to create the graph on a technical level4 . For more details on pre- and post-processing, please see Appendix A. 3.1. Preprocessing The foundation for the enrichment of our corpus is a pipeline containing a set of state-of-the- art NLP tools for the German language [6]. More precisely: Tokenization, Lemmatization, Sen- tence Splitting, Part-of-Speech Tagging, Morphological Analysis, Dependency Parsing, Named Entity Recognition, detection of direct, indirect, reported and free-indirect speech and Coref- erence Resolution. Scene segmentation is done outside of this pipeline with [19], the best contribution in the shared task ‘scene annotation’ 2021 [30]. 3.1.1. Character extraction The easiest way to determine if a character is present in a scene is to check if its name is men- tioned. But characters are o昀琀en mentioned even though they are not present. For the most common possibilities, we have created a 昀椀lter so that only mentions are considered that a) are outside of verbal actions and b) outside sentences with past perfect tense. In addition, a charac- ter must perform at least one action (be the subject of a sentence), to be considered present. For the special case of 昀椀rst-person narration, we had to use an extra routine, since the narrator’s name is mentioned only rarely. Therefore, if it is a 昀椀rst person narrative, all pronouns of the 1st person singular, which ful昀椀l the above conditions, are added to the character ”narrator”. We treat the information whether it is a 昀椀rst person narrative and the name of the narrator as given metadata. 3.1.2. Action extraction To capture actions of a character in a scene, all of its mentions are 昀椀ltered by those the depen- dency parser has labelled as subject of a sub-sentence. The dependency tree is searched for the corresponding predicate and, if available, object of the clause (see Tab. 1). The query can resolve active and passive constructions. Auxiliary verbs are skipped in the dependency tree. If a sub-sentence is followed by a sub-sentence of the same order, which does not contain a new subject, the subject of the 昀椀rst sub-sentence is retained. The result is a set of subject-verb- object triples associated with a character and a scene. Sentences in past perfect tense or direct speech are ignored. If the object of a triple is also a character mention, it is replaced in the triple by its name. 3.1.3. Valence and Arousal Valence and arousal are assessed using an a昀昀ective norms word list [17]. The values for char- acters are calculated from the average of the values of all tokens in triples in the novel with 4 Code and Data: https://github.com/LeKonArD/character_temp_graphs 324 Table 1: Example of Action Triple Extraction. Result: (rashad, heben, Bettdecke).transl: “visibly shuddering, he li昀琀ed the bed cover” -> (rashad, li昀琀, bed cover) which they are associated. The values for scenes are calculated from the values of all triples within it. 3.1.4. Interaction Types We identify three types of actions: Fighting, Erotic Actions and Talk. Combat and eroticism are determined by matching word lists on the subject-verb-object triples of a scene. How much is spoken in a scene can be directly determined by the output of speech recognition. Since scenes do not necessarily have only one interaction type, a score (e.g. relative share of words) for each type is calculated. 3.1.5. Character features How to detect character appearances and thus also who appears alongside is already discussed above under ‘Character Extraction’. This representation is complemented by the valence and arousal values at the character level (see: Valence and Arousal). To di昀昀erentiate between major and minor characters we use Temporal Closeness Centrality5 [22]. As a sanity test, we identi- 昀椀ed the protagonists and their love interests in 20 novels and checked their values. The result shows: In all cases the protagonist has highest centrality and the love interest is second. 3.1.6. Topic Model In order to add semantic information as a proxy for the motif repertoire to the predominantly structural model we resort to topic modeling [2]. Since our research corpus is not large enough to create our own topic model, we use a background corpus consisting of 10k other dime novels divided into segments of 500 tokens.6 Following the reasoning that there is a fundamental di昀昀erence between text and dialogue in scenes, we divide each scene into two documents based on this criterion. To underline this assumption we try to classify dialogue and text based on topic distribution. A logistic regression achieves a stable performance of an accuracy of .86 (std: 0.008). 5 We used the python library Teneto [28] for the representation of the temporal graph and the computation of the centrality measure; for an explanation of temporal graphs and the measure see below. 6 1.7m documents, 4000 iterations, 150 topics 325 We used a temporal graph to represent the scene and character information and computed the Temporal Closeness Centrality (details see Appendix). 3.2. Evaluation The evaluation of plot models proves to be especially challenging, because it is so time consum- ing. Ideally we would have for each text 3 or more structured summaries which cover all scenes. They would list the important characters and the important events (separately for direct speech and narrative text) for each scene, but would also indicate which scene could be le昀琀 out as not or less relevant. Usually we base our evaluation on data sets with a few hundred instances, but in this case the compilation would take - even with pulp 昀椀ction novels which are only 64 pages long - almost prohibitively long. (In this context the data set described in [27] which has event annotations for 100 novels is especially noteworthy). Therefore we think that for some time at least research on plot has to use proxies. In this paper we use three approaches. 1. Because plot schemas for very di昀昀erent genres are usually easy to distinguish, the task to distinguish genre based on a structural plot representation can be used as a proxy. Basically we measured the average distance between texts of a genre and between all texts and we expect texts which belong to a genre to show a marked lower distance. 2. Similar to [7] and [25] we construct a second set of text representations where we ran- domly change the sequence of scenes. Here the task is to distinguish real novels from the arti昀椀cial ones, in other words real novels should be more similar to each other than the arti昀椀cial counterparts. We also inform about the distances between real and arti昀椀cial texts split a昀琀er genres to capture genre speci昀椀c di昀昀erences. 3. Formulaic genres o昀琀en have recurring scenes which can be found in almost or all text instances. In romances, for example, there is always a scene in which the lovers meet for the 昀椀rst time. In pulp 昀椀ction horror, there is almost always a scene where the protagonist 昀椀ghts the evil antagonist. We take half of the romances in our corpus and identify those scenes, which describe the 昀椀rst meeting. Then we replace these scenes in 60% of the texts with another scene (B1), in 20% of the texts we don’t change anything (B2), and in 20% of the texts we move the scene to the last third of the text (B3). Then we compare our text representations with the other half of the texts, which haven’t been changed (A1). If the representation is capturing temporal information, we should see a higher similarity of B2 with A while B1 and B3 are less similar. 4. Experiments The 昀椀rst experiment uses the evaluation task for genre di昀昀erentiation. Four approaches (see Fig. 3) to plot representation are tested: tf-idf. Word frequencies over the entire novel, weighted by tf-idf are the de facto standard for representing long texts and serve as a baseline. More speci昀椀cally, we use the 5000 most frequent content words (nouns, verbs and adjectives). Similarity is calculated with euclidean distance of tf-idf vectors. 326 Global Characteristics. The second approach is based on properties of the entire novel, which are generated by queries on the temporal graph. Following properties are included: Number of characters, the average of 昀椀ght score, erotic score, share of speech, arousal, valence, character centrality and proportion of characters genders over all scenes. Simi- larity is calculated with euclidean distance of all features. Time Series. This representation models the plot of a novel as a multidimensional time series, where scenes are used as timesteps. Each timestep consists of the information on: num- ber of characters, 昀椀ght score, erotic score, share of speech, arousal, valence, character centrality and the proportion of characters genders. We measure similarity by applying multidimensional dynamic time warping with euclidean distance [26]. Temporal Graph. To measure similarity of temporal graphs directly, without condensing the available information to other formats (e.g. time-series), we make use of dynamic tem- poral graph warping (dtgw) introduced by [10]. Unfortunately, this measure does not use the node and edge weights and attributes in its calculation of similarity, only dis- tances between unweighted edges are covered. Therefore, only the information about who appears in which scene is included in this calculation. Figure 3: Overview on plot representation approaches. For details on the NLP Pipeline see Appendix A. Figure 4 shows the results of the 昀椀rst evaluation task7 . As expected, both genres are easily 7 To avoid bias due to di昀昀erent group sizes, each experiment is repeated 500 times with ten randomly drawn novels from each group. 327 Figure 4: Results of evaluation task 1. Romance: Distance between Love novels; Horror: Distance between Horror novels; Both: Distance between Horror and Love novels. To pass the test, the distance between Love and Romance should be smaller than between both. The y-axis is not labelled because only the relations of the distances are relevant for the experiment and not their absolute values. distinguished using tf-idf and Global Characteristics. The Time Series data is more blurry, but still passes the test, while the Temporal Graph representation fails. The second and third evaluation tasks involve altering the sequence of scenes. Therefore it is not reasonable to test representations lacking sequential information. This limits us to the use of Time Series and Temporal Graph representation. Since the temporal graph has already failed at the 昀椀rst task, only Time Series is tested. In addition to the variant already used in test 1, we test whether the performance can be increased by supplementing the structural information with semantic information, our proxy for the motif repertoire. For this purpose, the distribution of topics in scenes (separated into narration and speech) is reduced to 4 dimensions and used as an additional feature of the time series. We also try to reduce the number of scenes by using only the 10 scenes8 with the highest arousal value within a novel. Figure 5 shows the performance of this setup in evaluation test 2. The reduction to essential scenes is clearly a harmful preprocessing step. The enrichment with information from the topic model has only a very small in昀氀uence on the result. The same conclusions are valid for Evaluation Task 3 (see Fig. 6). 8 We also tested 5 and 20 scenes, without noticing any big di昀昀erences. 328 Figure 5: Results of evaluation task 2. Romance: Distance between Love novels; Horror: Distance be- tween Horror novels; Romance shu昀昀le: Distance between Love novels and shu昀昀led love novels; Horror shu昀昀le: Distance between Horror novels and shu昀昀led Horror novels. To pass this test, the distance of Romance shu昀昀le and Horror shu昀昀le needs to be higher than their non-shu昀昀led counterparts. (+topics: Topic Model Features included; +relevance: reduction to essential scenes ) 5. Discussion Most importantly, the result of the 昀椀rst experiment shows that the temporal structure even of the very reduced information we used to model plot is part of an overall plot shape which can be used to measure similarity of texts. The even more reduced version, in which we computed the similarity directly on the temporal graph, did not contain enough information. This vali- dates the approach to represent plot based on the temporal information of the text, but it also indicates that temporal graphs are a useful way to represent the information but at the moment are not a good way to compute the similarity between texts. Contrary to our expectations the addition of more concrete information about the motif repertoire of direct speech and narrative text in the form of topic models did not increase the similarity. It is unclear to us whether this is caused by an unsatisfying representation, in other words maybe the topic models did not capture the motif repertoire, for example because it lacks generalization. Anecdotal evidence suggests that this is the case for some motifs. In the romance novels we were able to identify a retarding plot element, namely the heroine’s doubt as to whether the beloved is seriously interested in her at all. But the reasons for these doubts and the concrete ways these doubts are articulated are very di昀昀erent and have little in common on the surface of the text. Another reason for the low performance increase could be that the integration of the information about the motif repertoire into our scene representation was 329 Figure 6: Results of evaluation task 3. A1: Distance between A1 and A1 (intra group distance). B2: Distance between A1 and B2; B1: Distance between A1 and B1; B3: Distance between A1 and B3. To pass this test B2 needs to be higher than A1 and lower than B1 and B3. subpar, for example because the valid information is drowned in the noise of all the scenes and topics which a reader would 昀椀lter out. Also our attempt to detect the relevant scenes has not worked as intended. As the concept of relevance is also part of the more general problem of detecting the main elements of the plot, this problem is probably closely related to the problem of generalizing and abstracting the event information. There is a challenging relationship between the text speci昀椀c use of the motif repertoire and the generalization necessary to allow the comparison of texts and the evaluation of similarity. The concreteness of the instantiation of the motif repertoire basically leads to an information overload. We evaluated our scene representation by using a distance metric based on a similarity mea- sure using dynamic time warping. It is unclear to us whether this measure is the best way to proceed. It looks at the whole time series allowing for di昀昀erences in the temporal extension of the patterns. But most of the information may be actually noise under the perspective of reconstructing the human perception of similarity of narratives. To proceed further in this direction the following research problems have to be solved in a more satisfying way: • What is the best graph representation to include all relevant information and derive sim- pler views for computational purposes. A temporal graph alone is unsatisfactory, be- cause then the information about the scenes has to be handled externally. So a bipartite graph may be a useful model, where one set of nodes and edges represent the temporal 330 graph as in our approach and another set of nodes represent the scenes. • Identi昀椀cation of those scenes which are crucial for the plot. A relevance score for each scene could be used to 昀椀lter the relevant ones based on the level of abstraction intended. • Abstraction and generalization of events. This is probably the hardest problem of all and can only be approached by annotating the motif repertoire for one genre more exten- sively. On this level also patterns of scene n-grams could be extracted, like ‘captured- freed’. • Abstraction and generalization of events. This is probably the hardest problem of all and can only be approached by annotating the motif repertoire for one genre more exten- sively. On this level also patterns of scene n-grams could be extracted, like ‘captured- freed’. • We need a clearer understanding of what uses the term ‘plot’ in literary studies (beyond the meta discussion in narratology) really has, for example in the construction of genres. Similarity of complex phenomena usually happens by comparing them under a speci昀椀c perspective which ignores a lot of given information. To achieve this level of abstraction and generalization we should analyze how it is done in literary studies. • In the long run, a real evaluation will have to be based on human judgment, that is annotations: Structured summaries of a genre corpus which will also create the motif repertoire for this speci昀椀c corpus. These annotations could also be the ground truth for derived text formats as we used them in this paper (we basically just assumed that they work as intended). As each genre will have to create its own motif repertoires, working with these automatically derived formats will be unavoidable and needs to be put on a solid basis. Additionally, the problems we did not touch upon in this paper have to be solved too, for example the temporal reordering of the scenes and the detection of narrative levels. As already mentioned in the introduction, the main contribution of this paper is not a solution to a problem, but a more extensive description of the aspects involved in the rather complex problem of plot. Its main purpose is to be used as the basis for the communication in CLS and to drive research in the many subproblems we outlined. References [1] D. Bamman, T. Underwood, and N. A. Smith. “A Bayesian Mixed E昀昀ects Model of Liter- ary Character”. In: Proceedings of the 52nd Annual Meeting of the Association for Compu- tational Linguistics (Volume 1: Long Papers). Baltimore, Maryland: Association for Com- putational Linguistics, 2014, pp. 370–379. doi: 10.3115/v1/P14-1035. url: http://aclweb .org/anthology/P14-1035. [2] D. Blei, A. Y. Ng, and M. I. Jordan. “Latent Dirichlet Allocation”. In: Journal of Machine Learning Research 3 (2003), pp. 993–1022. [3] R. L. Boyd, K. G. Blackburn, and J. W. Pennebaker. “The narrative arc: Revealing core narrative structures through text analysis”. In: Science Advances 6.32 (2020), eaba2196. doi: 10.1126/sciadv.aba2196. url: https://www.science.org/doi/10.1126/sciadv.aba2196. 331 [4] A. Brunner, S. Engelberg, F. Jannidis, N. D. T. Tu, and L. Weimer. “Corpus REDEWIEDER- GABE”. In: Proceedings of The 12th Language Resources and Evaluation Conference, Mar- seille. Marseille, 2020, pp. 796–805. url: http://www.lrec-conf.org/proceedings/lrec2020 /pdf/2020.lrec-1.100.pdf. [5] J. Egbert and M. Mahlberg. “Fiction – one register or two?” In: Register Studies 2.1 (2020), p. 72. url: https://www.academia.edu/42908069/Fiction%5C%5Fone%5C%5Fregister%5 C%5For%5C%5Ftwo%5C%5FSpeech%5C%5Fand%5C%5Fnarration%5C%5Fin%5C%5Fnov els. [6] A. Ehrmanntraut, L. Konle, and F. Jannidis. LLpro, A Literary Language Processing Pipeline for German Narrative Texts. 2022. url: https://github.com/aehrm/LLpro. [7] M. Elsner. “Abstract Representations of Plot Structure”. In: Linguistic Issues in Language Technology, Volume 12, 2015 - Literature Li昀琀s up Computational Linguistics. CSLI Publica- tions, 2015. url: https://www.aclweb.org/anthology/2015.lilt-12.5. [8] M. Elsner. “Character-based kernels for novelistic plot structure”. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics. Eacl ’12. Usa: Association for Computational Linguistics, 2012, pp. 634–644. [9] G. Freytag. Freytag’s Technique of the Drama: An Exposition of Dramatic Composition and Art. 4th ed. Chicago: Scott, Foresman and Co. 4th. Chicago: Scott, Foresman and Co., 1908. [10] V. Froese, B. Jain, R. Niedermeier, and M. Renken. “Comparing Temporal Graphs Using Dynamic Time Warping”. In: Complex Networks and Their Applications VIII. Ed. by H. Cheri昀椀, S. Gaito, J. F. Mendes, E. Moro, and L. M. Rocha. Studies in Computational Intel- ligence. Cham: Springer International Publishing, 2020, pp. 469–480. doi: 10.1007/978-3 -030-36683-4\_38. [11] A. J. Greimas. Structural Semantics: An Attempt at a Method. Lincoln: University of Ne- braska Press, 1983. [12] P. Hühn. Event and Eventfulness. Ed. by P. Hühn, J. Pier, W. Schmid, and J. Schönert. Hamburg, 2013. url: https://www-archiv.fdm.uni-hamburg.de/lhn/node/39.html. [13] L. Jahan, R. Mittal, and M. Finlayson. “Inducing Stereotypical Character Roles from Plot Structure”. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Lan- guage Processing. Online and Punta Cana, Dominican Republic: Association for Compu- tational Linguistics, 2021, pp. 492–497. url: https://aclanthology.org/2021.emnlp-main .39. [14] M. Jockers. » A Novel Method for Detecting Plot Matthew L. Jockers. 2014. url: https://w ww.matthewjockers.net/2014/06/05/a-novel-method-for-detecting-plot/. [15] A. Kazantseva and S. Szpakowicz. “Summarizing Short Stories”. In: Computational Lin- guistics 36.1 (2010), pp. 71–109. doi: 10.1162/coli.2010.36.1.36102. url: https://aclanthol ogy.org/J10-1003. [16] C. Koolen. Reading beyond the female : The relationship between perception of author gen- der and literary quality. Amsterdam: Institute for Logic, Language and Computation, 2018. 332 [17] M. Köper and S. Schulte im Walde. “Automatically Generated A昀昀ective Norms of Ab- stractness, Arousal, Imageability and Valence for 350 000 {G}erman Lemmas”. In: Por- torož, Slovenia: Erla, 2016, pp. 2595–2598. [18] K. Kukkonen. Plot. Ed. by P. Hühn, J. Pier, W. Schmid, and J. Schönert. Hamburg, 2014. url: https://www-archiv.fdm.uni-hamburg.de/lhn/node/115.html. [19] M. Kurfalı and M. Wiren. “Breaking the Narrative: Scene Segmentation through Sequen- tial Sentence Classi昀椀cation”. In: Proceedings of the Shared Task on Scene Segmentation. Düsseldorf, 2021. url: http://ceur-ws.org/Vol-3001/paper6.pdf. [20] J. C. Meister. Computing Action: A Narratological Approach. 1 edition. Berlin ; New York: De Gruyter, 2003. [21] O. Michail. An Introduction to Temporal Graphs: An Algorithmic Perspective. 2015. doi: 10.48550/arXiv.1503.00278. url: http://arxiv.org/abs/1503.00278. [22] R. K. Pan and J. Saramäki. “Path lengths, correlations, and centrality in temporal net- works”. In: Physical Review E 84.1 (2011), p. 016105. doi: 10.1103/PhysRevE.84.016105. url: https://link.aps.org/doi/10.1103/PhysRevE.84.016105. [23] A. Piper, R. J. So, and D. Bamman. “Narrative Theory for Computational Narrative Un- derstanding”. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Lan- guage Processing. Online and Punta Cana, Dominican Republic: Association for Compu- tational Linguistics, 2021, pp. 298–311. url: https://aclanthology.org/2021.emnlp-main .26. [24] V. Propp. Morphology of the Folktale. Austin: University of Texas P. Austin: University of Texas Press, 1968. [25] N. Reiter, J. Sieker, S. Guhr, E. Gius, and S. Zarrieß. “Exploring Text Recombination for Automatic Narrative Level Detection”. In: Proceedings of the 13th Conference on Language Resources and Evaluation (LREC 2022). 2022, pp. 3346–3353. [26] M. Shokoohi-Yekta, B. Hu, H. Jin, J. Wang, and E. Keogh. “Generalizing DTW to the multi-dimensional case requires an adaptive approach”. In: Data Mining and Knowledge Discovery 31.1 (2017), pp. 1–31. doi: 10.1007/s10618-016-0455-0. url: https://doi.org/10 .1007/s10618-016-0455-0. [27] M. Sims, J. H. Park, and D. Bamman. “Literary Event Detection”. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence, Italy: As- sociation for Computational Linguistics, 2019, pp. 3623–3634. doi: 10.18653/v1/P19-1353. url: https://aclanthology.org/P19-1353. [28] W. H. Thompson, P. Brantefors, and P. Fransson. “From static to temporal network the- ory: Applications to functional brain connectivity”. In: Network Neuroscience 1.2 (2017), pp. 69–99. doi: 10.1162/NETN\_a\_00011. url: https://doi.org/10.1162/NETN%5C%5Fa %5C%5F00011. 333 [29] M. Vauth, H. O. Hatzel, E. Gius, and C. Biemann. “Automated Event Annotation in Lit- erary Texts”. In: CHR 2021: Computational Humanities Research Conference, November 17–19, 2021, Amsterdam, The Netherlands. Amsterdam, 2021, p. 13. url: http://ceur-ws.o rg/Vol-2989/short%5C%5Fpaper18.pdf. [30] A. Zehe, L. Konle, S. Guhr, A. Hotho, F. Jannidis, L. Kaufmann, M. Krug, F. Puppe, N. Reiter, and A. Schreiber. “Shared Task on Scene Segmentation”. In: Stss Konvens. 2021, p. 21. A. Pre and Postprocessing Preprocessing. The output of the di昀昀erent preprocessing tools (Tokenization, Lemmatiza- tion, Sentence Splitting, Part-of-Speech Tagging, Morphological Analysis, Dependency Pars- ing, Named Entity Recognition, detection of direct, indirect, reported and free-indirect speech and Coreference Resolution) is carefully aligned and saved in conll-format. Scene segmentation is not (yet) part of this pipeline, therefore we tested both passing novels through the pipeline and segment a昀琀erwards or segment 昀椀rst and processing the segments individually. A昀琀er re- viewing the results, we conclude that a priori segmentation is preferable. From a theoretical perspective, the segmentation can only a昀昀ect the pipeline steps NER, Speech detection and Coreference Resolution, since the other tools work on sentence and word level. The impact on NER and Speech Detection is negligible, considering the size of the context windows these tools use, since scenes are much longer. Coreference resolution, on the other hand, operates on the entire text. The idea that more text and thus more information about characters (alter- native names, appellatives, gender) increases performance is obvious. However, according to our 昀椀ndings, it is beyond the corefenece model’s capabilities to exploit this information over a long text. This agrees with the original authors’ assessment that the memory capacity of the model is not su昀케cient for long texts. For example, we see that despite matching names, new corefence clusters are created or even worse all mentions of a paragraph are assigned to one cluster regardless of di昀昀ering gender and names. This behavior is suppressed by a-priori seg- mentation. This is not surprising, considering that the de昀椀nition of scenes in the dataset which was used to train the segmentation tool is strongly tied to stable character constellations. Postprocessing. Both tools for scene segmentation (y: 0.17) and coreference resolution (F1: 64.72) are far from perfect. Nevertheless, we think they are good enough to work with. To improve the results a bit more we apply a number of post-processing steps. The biggest source of error in scene segmentation is over-segmentation, which leads to arbitrarily short scenes. To mitigate this, we set a lower limit for scene length of 200 words. If this is underrun, we merge a scene with its following one. Coreference postprocessing is a bit more complex. First, the most frequent proper name of a cluster is set as its identity. Then all other proper names in this cluster are checked, if they have already been present in a previous scene, the mentions are assigned to this cluster. Then the grammatical gender is used. For example, if there are male mentions (pronouns) in a female cluster, they are assigned to the nearest cluster in the text with the appropriate gender. In this way, coreference resolution bene昀椀ts from both: Information from preceding text and 334 meaningful segmentation. In the case of 昀椀rst-person narratives, all 昀椀rst person pronouns (ich, mein, meiner, meine, etc.) are assigned to the prede昀椀ned entity of the narrator. This is required since the model is not trained for this type of text and the narrator’s name is rarely mentioned and if mostly inside of direct speech. Mentions of groups and clusters without proper names are ignored completely. B. Modeling Temporality with Temporal Graphs Temporal Graphs are an interesting extension to graph theory which has developed methods to represent and analyze static graphs - and in recent years an increasing amount of research is looking into the much more complicated situations of graphs which develop over time[21]. Temporal graphs add the dimension of time. Figure 7 shows a temporal graph as a sequence of static graphs. Each time step represents nodes and their links, in our use case the character constellation in one scene. Figure 7: A simple temporal graph as a sequence of static graphs. Figure 8 shows a variant of this visualization, where we substituted the explicit depiction of the edges with an implicit representation: Character nodes are only shown for those scenes, in which they are present in a scene and the interaction of the co-present characters in a scene is implied. Bill, Sheila, Suko and Jane are friends of the protagonist John. Harris is the antagonist, Clou, his helper, and Martha, Peter and Wayne (victims). The story is told mainly from the perspectives of Sheila and John. Based on this representation as a temporal graph, we calculated the temporal closeness cen- trality for each character. Temporal closeness centrality [22] is a generalization of static close- ness centrality. A high value of Ct indicates that other nodes can be easily reached from the node i. Obviously it would be the best representation to add this centrality information to each character node. But similarity measures for temporal graphs are not able yet to handle node at- tributes but only work on the basic network structure. So we averaged the centrality measures for all characters and used it as a scene attribute. Temporal graphs, which have been intensely researched in recent years, provide a rich medium to model all aspects of plot we are interested in. In our case the information described above can be transformed into a complex temporal graph. In order to realize its full potential, the graph needs several types of nodes (character types, scenes) and edges (interaction types), 335 Figure 8: Temporal Graph of the horror novel “John Sinclair Nr.6: Anruf aus dem Jenseits” (Call from the beyond.) as well as weighting of these edges. Unfortunately, the goal of representing the entire complex- ity leads to a model to which no methods are applicable. Therefore computation of measures will then be done on simpli昀椀ed views of these integral graphs. 336