=Paper=
{{Paper
|id=Vol-2989/long_paper49
|storemode=property
|title=Detecting Narrativity Across Long Time Scales
|pdfUrl=https://ceur-ws.org/Vol-2989/long_paper49.pdf
|volume=Vol-2989
|authors=Andrew Piper,Sunyam Bagga,Laura Monteiro,Andrew Yang,Marie Labrosse,Yu Lu Liu
|dblpUrl=https://dblp.org/rec/conf/chr/PiperBMYLL21
}}
==Detecting Narrativity Across Long Time Scales==
Detecting Narrativity Across Long Time Scales Andrew Piper, Sunyam Bagga, Laura Monteiro, Andrew Yang, Marie Labrosse and Yu Lu Liu McGill University, 688 Sherbrooke St, H2J3B2 Montreal, Canada Abstract Storytelling is a universal human practice that serves as a key site of education, collective memory, fostering social belief systems, and furthering human creativity. It can occur in different discursive domains for different social purposes with differing degrees of intensity. In this project, we develop computational methods for measuring the degree of narrativity in over 335,000 text passages dis- tributed across two- to three-hundred years of history and four separate discursive domains (fiction, non-fiction, science, and poetry). We show how these domains are strongly differentiated accord- ing to their degree of narrative communication and, second, how truth-based discourse has declined considerably in its utilization of narrative communication. These findings suggest that there has been a long-term historical differentiation between the practices of knowing and telling, which raises important questions with respect to the social acceptance of both science and the arts. Keywords narratology, history, systems theory, discourse analysis, computational narrative studies, digital humanities, natural language processing 1. Introduction In his 1976 essay “Boundaries of Narrative,” Gérard Genette invited readers to “consider the principal plays of oppositions through which narrative defines and constitutes itself in the face of various nonnarrative forms” (p. 1) [8]. Over the ensuing decades, researchers have elaborated a variety of schemas to characterize narrative communication, creating a well-established corpus of theoretical work [9, 1, 7, 11]. Underpinning much of this work is the belief that there are intrinsic linguistic properties that predictably, if not universally, adhere within narrative forms of communication [30]. Narrative, according to these theoretical frameworks, is a detectable linguistic phenomenon. One of the principal shifts to occur in the field of narratology over the past several decades has been an emerging understanding of narrative as a matter of degree rather than of kind [11, 10, 22]. “Narrativity” according to these theories is a quality that can best be understood not as a global binary class (a document either is or is not narrative), but as a local, multi- dimensional scalar property. As Ochs and Capps [19] write, “We believe that narrative as genre and activity can be fruitfully examined in terms of a set of dimensions that a narrative displays to different degrees and in different ways” (p. 19). In this sense, a narrative document, such as a novel, may exhibit greater or lesser degrees of narrativity at different moments CHR 2021: Computational Humanities Research Conference, November 17–19, 2021, Amsterdam, The Netherlands £ andrew.piper@mcgill.ca (A. Piper); sunyam.bagga@mail.mcgill.ca (S. Bagga); laura.monteiro@mail.mcgill.ca (L. Monteiro); andrew.yang3@mail.mcgill.ca (A. Yang); marie.labrosse@mail.mcgill.ca (M. Labrosse); yu.l.liu@mail.mcgill.ca (Y.L. Liu) DZ 0000-0001-9663-5999 (A. Piper) © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Wor Pr ks hop oceedi ngs ht I tp: // ceur - SSN1613- ws .or 0073 g CEUR Workshop Proceedings (CEUR-WS.org) 319 in the text, just as ostensibly non-narrative documents, such as scientific reports, may also exhibit degrees of narrativity. Herman [11] has taken this understanding one step further to suggest that narrativity is not simply a matter of the local interplay of formal and textual features, but emerges through the interaction between readers and texts. Narrativity can thus be understood as a potentially rising or falling quality within documents (or other forms of communication) that depends on the interaction of different linguistic or semiotic features combined with readers’ responses. While a wealth of recent work in the field of natural language processing has engaged with the detection of different dimensions of narrativity (such as causal and temporal relations [18], turning points [21, 2], reportable events [20], frames [6], etc.), no work to our knowledge has undertaken the more elementary task of narrativity detection itself. Can we reliably predict whether a span of text is engaging in narrative communication and if so, to what degree of intensity? Such work has the potential not only to contribute to our theoretical understanding of narrativity as a form of communication. It can also provide empirical insights into the distribution of narrativity across different discursive domains and time periods giving us insights into the social functions of narrative communication over time. The latter will be our concern here. In this paper, we develop computational models to detect “narrativity” as a local, multi- dimensional textual quality across four different discursive domains over a two- to three- hundred year time-period. Our aim in doing so is to test the relationship between narrative communication and the process of functional differentiation among social systems as theorized by the sociologist Niklas Luhmann [16]. According to Luhmann, social systems are governed by communicative practices (“codes”) that maintain a system’s internal coherence in distinc- tion from other systems. As societies modernize, differentiation strengthens over time as each system evolves to maintain its internal coherence in distinction from its environment (i.e. other systems). The question we wish to test here is whether the practice of narrative participates in this process of functional differentiation between the social systems Luhmann labels “art” and “sci- ence.” According to Luhmann, art’s function lies in its ability to communicate the sensory process of observation, to “allow a world to appear within the world” (p. 241), whereas the function of science is to “structure the field of possible statements [about the world] with the help of the code true/untrue” (p. 227) [15]. As a form of communication strongly associated with the idea of “world-building” [11], we would thus expect narration to be highly associated with artistic forms of expression but not necessarily negatively associated with scientific dis- course. There is nothing intrinsic to narrative communication that makes it an inappropriate vehicle for fact-based discourse. After all, one can tell true and untrue stories. And yet according to the data and models used here, we can observe a very clear historical trajectory of the de-narrativization of truth-based discourse. Our findings bring to light long- standing and growing tensions between what Hayden White first introduced as the relationship between knowing and telling [29]. For White, the function of narration should be understood as “a solution to a problem of general human concern, namely, the problem of how to trans- late knowing into telling, the problem of fashioning human experience into a form assimilable to structures of meaning that are generally human rather than culture-specific” (p. 5) [29]. Narration is a key communicative mode for White that makes knowledge “assimilable” to in- dividual human beings and collective societies. As a growing body of research has indicated, narrative is indeed an effective means of addressing the problem of knowledge sharing across a variety of social domains, from economics [25] to climate change [5] to political polarization 320 [13]. Our findings, preliminary as they are, suggest the need for further research into this disasso- ciation of science and narration and its potential social effects. Is the growing public distrust in science related to the denarrativization of scientific communication? Do efforts of “public science” or science journalism have a positive effect on reversing public distrust and are such effects related to their degree of narrativity? If narration is increasingly seen as belonging to the domain of art, has this differentiation from science unintentionally contributed to the devaluation of the arts (or their study)? How might the arts instead participate more explicitly in the process of knowledge transfer, i.e. help “recouple” in Luhmann’s terms the practices of knowing and telling? In order to detect narrativity in our historical collections, we undertake the following steps, which we describe in greater detail in the following sections. First, we construct a data set of 335,245 documents to represent our two primary social systems of art and science, which we subdivide into four domains of fiction, poetry, science, and non-fiction. We then develop a working theory of “narrativity” drawn from existing theoretical literature that informs our manual annotation of the data. Building a team of three trained student annotators, we hand- annotate 401 passages according to a scalar understanding of narrativity derived through numerous meetings and discussion. This data is then used to train and test our machine learning models, which we describe in Section 3. We present our results in Section 4 and include a discussion of their potential implications as well as limitations (Section 5). Finally, we conclude with a brief discussion of where future work in computational narrative studies might lead. 2. Data 2.1. Annotated Data In order to annotate training data for the presence of narrativity, we rely on the following theo- retical schema developed by Herman [11]. According to this schema, narrative communication consists of the following four elements: 1. Situatedness: narrativity depends on the social context in which it occurs 2. Event sequencing: narrativity depends on temporally ordered events 3. World making: narrativity depends on the fact of disequilibrium such that we can observe a change in the world 4. Feltness: narrativity captures the experience of events, i.e. “what it is like” Herman’s categories can be seen as syntheses of previous narratological frameworks, captur- ing a good degree of consensus in the field. The emphasis on feltness, for example, is strongly indebted to the argument by Fludernik [7] that “Experientiality reflects a cognitive schema of embodiedness that relates to human existence and human concerns” (p. 9). Similarly, event-sequencing is strongly indebted to the work of theorists like Genette [9], Sternberg [26, 27], and Ricoeur [24] and their emphasis on temporality as a central component of narrative communication, while world making derives from the work of Labov and Waletzky [14] and Bruner [3]. In general then, Herman’s model is guided by the notion that, “Narrative roots itself in the lived, felt experience of human or human-like agents interacting in an ongoing way with their 321 cohorts and surrounding environment” (our emphasis). Thus for Herman what matters most about narrativity is: a) the centralization of one or more agents; b) the sequencing of events and thus time; and finally, c) the idea of “lived experience in an environment”, i.e. a sense of world building. Based on this theoretical framework, we hand-annotate 401 passages drawn from the exper- imental data using the following steps: First, we assembled a team of three annotators who all have majors in the humanities. These are readers who have high levels of education and exposure to training in textual analysis. Second, over the course of several weeks we engaged in discussions and experiments regarding the concept of “narrativity” with respect to the theoretical framework discussed above as well as different kinds of text passages. These discussions culminated in a codebook, which is included in the supplementary material.1 Annotators were then asked to code a given passage across three dimensions of narrativity, which were defined for the annotators as “agency,” “event sequencing,” and “world making.” Note how we translated Herman’s “feltness” into “agency” to better account for the idea of experientiality at the heart of most major narrative theories. For each passage, readers were asked to respond to the following statements using a five-point Likert scale: • “This passage foregrounds the lived experience of particular agents.” (Agency) • “This passage is organized around sequences of events that occur over time.” (Event sequences) • “This passage creates a world that I can see and feel.” (World making) 1. Strongly disagree 2. Somewhat disagree 3. Unsure 4. Somewhat agree 5. Strongly agree Notice how we do not expressly ask if readers felt that the passage was “narrative” or not. Rather, we ask them to consider their feelings with respect to these three primary narrative di- mensions, which we then average into a single “narrativity” score. We found that this increased reader agreement and allowed for more nuanced understandings of narrative communication. For example, it was not uncommon for some types of discourse to emphasize sequential events but lack an emphasis on agency or building a world. We provide a few sample passages that received low and high average narrativity scores by readers. Note that passages have been truncated from their actual length. Non-Fiction - Average Reader Score 1.2 The employment of the uninterpretable symbol in the intermediate processes of trigonometry furnishes an illustration of what has been said. Lapprehend that there is no mode of explaining that application which does not covertly assume the very principle in question. But that principle, though not, as I conceive, warranted by formal reasoning based upon other grounds, seems to deserve a place among those axiomatic 1 Note that we provide the reader-annotated data, annotator’s codebook, metadata, code for all models and concrete implementation details of custom features from Table 1 in the Supplementary Material. It is available at https://doi.org/10.7910/DVN/DAWVME 322 truths which constitute in some sense the foundation of general knowledge, and which may properly be regarded as expressions of the mind’s own laws and constitution. Fiction - Average Reader Score 1.44 It is too weak for a shield, too transparent for a screen, too thin for a shelter, too light for gravity, and too threadbare for a jest. The wearer would be naught indeed who should misbeseem such a wedding garment. But wherefore does the sheep wear wool? That he in season sheared may be, And the shepherd be warm though his flock be cool. Science - Average Reader Score 4.55 I assisted at the opening of her Body, and having found in the matrix a little round mass of the bigness of a great black Cherry, I took the husband aside, and asked him, Num a tempore fluxus menstruorum uxorem cognevisset? And having received for answer, that he had, I prayed him to let me carry home with me this little ball, which I had found in her womb. I was no sooner come home but I opened it, and found, that nature had wrought with so much activity in so small a time... Fiction - Average Reader Score 4.55 Whereupon a sudden outcry arose within the house, and a head popped angrily out of the aperture so suddenly created. But as instantly it returned within. For Jorian tossed the lattice to the ground by the door and thrust his spear-head into the cravat of red which the man had about his throat, shouting to him all the while in the name of the Prince, of the Duke, of the Emperor, of the Archbishop, of all potentates, lay and secular, to come down and open the gates. Because our annotations use a multi-point scale, we assess inter-rater reliability (IRR) using the average deviation index as discussed in Burke, Finkelstein, and Dusig [4]. We report an average deviation of 0.48 (± 0.27). This indicates that on average our annotators’ judgments per passage fall within just under 0.5 points of each other on our 5-point Likert scale, suggesting reasonable levels of agreement. A one-way ANOVA was conducted to compare the effect of genre on average deviation among annotators, with a significant effect observed [F (3, 397) = 7.56, p = 6.26e − 05], with poetry generating significantly more deviation among annotators than the other genres (mean AD of 0.58). We also note that as seen in Figure 1, annotator scores were were not normally distributed around 3.0, but rather exhibit a skewed central tendency between 2.0 and 2.5. 65% of the annotations were below 3, suggesting there were only a minority of passages where our annotators were confident of the passage’s narrativity. 2.2. Experimental Data Our experimental data consists of five separate collections that are designed to capture the two social systems of “art” and “science,” which we represent as the four discursive domains of fiction, non-fiction, poetry, and science. Doing so allows us to see aggregate behavior across the two systems as well as potential internal differences based on discourse type. Our data consists of: • Fiction & Non-Fiction. This data is derived from the Hathi Trust Digital Library and is drawn from Piper and Bagga [23]. It encompasses 85,130 passages of fiction and 99,968 passages of non-fiction spanning the years 1800-1999 written in English. The labels are generated using modified predictive models based on prior work [28]. The distribution 323 Figure 1: Distribution of the averaged annotator scores for “narrativity.” Figure 2: Distribution of the number of passages per year for each of the four domains of (a) Fiction, (b) Non-Fiction, (c) Poetry, and (d) Science. of the number of passages per year is indicated in Figure 2. Years represent year of publication, not year of composition or first printing. Our data reflects reading material available in a given year as archived by academic libraries. • Poetry. This data is drawn from the Literature Online Poetry database. It consists 324 of 73,077 poems by 857 authors who wrote in English and who were alive during the nineteenth and twentieth centuries. To estimate year of publication, we use the author’s birth-date plus 35 years to capture an estimated career midpoint. Because of the rel- atively small number of poets in our dataset, we are not able to capture a consistent number of poems per year. • Science. To represent the domain of scientific writing, we use two different data sets. The first is drawn from the Royal Society Corpus (RSC 4.0) based on the first two centuries of the Philosophical Transactions of the Royal Society of London from its beginning in 1665 to 1869 [12]. Due to copyright restrictions, no data is publicly available after 1869. This dataset consists of 31,698 documents. To augment this data, we use a collection of 45,439 randomly selected articles drawn from top 100 most common articles in the JSTOR Data for Research platform organized under the heading “physical sciences” published between the years 1900 and 2015. The distribution of articles over time is captured in Figure 2. Because our interest is in local narrativity, i.e. the extent to which a span of tokens expresses narrative communication, we represent our documents as randomly selected sequences of 5 sentences in length. This number has been indicated in prior work as a reasonable frame in which completed narratives can transpire [17]. We can thus assume that “narrativity” can be present in spans of this length. Future work will want to explore this parameter further. 3. Models For the purposes of our project, we use the predicted probability of a passage’s “narrativity” as an indicator of the degree of narrative communication present in that passage. In order to build a model to predict a passage’s narrativity, we train and validate our models using our reader annotated data. We experiment with three widely-used algorithms (Logistic Regression, Support Vector Machines, and Random Forests) and multiple combinations of different features to identify the best-performing model. We present our feature components in Table 12 and present the performance of each model using different feature combinations in Figure 3. As can be seen in Figure 3, Random Forest performs the best out of the three learning algorithms. Table 2 presents a brief overview of the top-5 performing models using Random Forest. We assess model performance according to Pearson’s correlation coefficient rather than the more traditional F1 score (although we also report traditional classification metrics in Table 2). Because our metric of narrativity is predicted probability and not a binary classification, the question we want to address is how well our models correlate with the scalar nature of reader judgments. To construct our experimental feature spaces, we aggregate our features into three general categories: lexical features (ngrams), syntactical features (part-of-speech and dependency rela- tionships), and higher-level custom features designed to capture specific narratological theories, including time, concreteness, animate entities and perceptuality. For a full discussion of the custom features, see the supplementary material. As we can see in Figure 3, all three classifiers behave similarly and achieve their maximum performance on a variety of feature combinations. Interestingly, unigrams tend to perform 2 Note that the results shown in Figure 3 correspond to a maximum of 100 features per category. This is why # Features for word-bigrams, for example, is 100 although the complete feature space involved 25,434 word-bigrams. Experiments with other values of max-features yielded similar results. 325 Table 1 Description of the individual feature categories used by our learning models. Feature-Category Description # Features pos1 part-of-speech unigrams 38 pos2 part-of-speech bigrams 100 pos3 part-of-speech trigrams 100 pos23 part-of-speech bigrams & trigrams 100 dep1 dependency tag unigrams 45 dep2 dependency tag bigrams 100 dep3 dependency tag trigrams 100 dep23 dependency tag bigrams & trigrams 100 word1 word unigrams 100 word2 word bigrams 100 word3 word trigrams 100 word23 word bigrams & trigrams 100 Tense tokens annotated with time-related tags by NER 1 Mood measures of setting, concreteness, eventfulness and saying 4 Voice animate entities and perceptual vocabulary 4 % Quoted ratio of words in dialogue 1 Table 2 Random Forest’s performance with different feature-combinations on the reader-annotated passages using 5-fold cross validation. The top-5 performing Random Forest models are shown here (see Figure 3 for all feature spaces.) Note that TMV is used as shorthand for Tense + Mood + Voice features. Feature-Set Pearson-r F1-Score Precision Recall pos1 + TMV + Pct-Quoted 0.742 0.787 0.796 0.780 pos1 + TMV 0.740 0.788 0.801 0.780 All Categories 0.735 0.790 0.798 0.785 pos1 + Mood 0.732 0.783 0.803 0.770 pos1 + dep1 + TMV 0.717 0.782 0.799 0.770 better than the limited sets of bi- or trigrams for lexemes, pos, and dependency tags. The best performing model consists of part-of-speech unigrams, % dialog and custom-built features that aim to capture “event sequences”, “world building”, and “agency” for which we use the categories tense, mood, and voice. There appears to be a strong grammatical signature to narrativity that marginally grows in strength when we add in features that capture the notion of “environment” emphasized in Herman’s theory above [11]. We leave a deeper exploration of these issues to future work. As we can see in Figure 4, the correlation between reader judgments and predicted probability is approximately linear and indicates a reasonable level of agreement (r = 0.742). We observe 326 Figure 3: A comparison of different learning algorithms using different combinations of features. Pearson-r is computed on 5-fold cross-validated predictions. Figure 4: Visualization of the correlation between the averaged reader-annotations and our best model’s predicted probability of a passage’s “narrativity” (r = 0.742). higher levels of variability in the middle range of annotations between 2.0 and 3.0, which is to be expected. As readers’ judgments become less certain, so too do we observe more variability in our models’ predictions. Future work will want to explore the extent to which more annotations lead to higher levels of correlation or whether we achieve some kind of maximum level of correlation between computational models and human judgments in this area. 327 Figure 5: Five-year rolling averaged yearly probability of the text being narrative across all four domains by the best-performing model: pos1 + TMV + Pct-Quoted. 4. Results Applying our best performing model (Random Forest with POS-unigrams + Pct-Quoted + tense, mood and voice features) on the experimental data described above, we generate the average yearly predicted probability of narrativity across all four domains as shown in Figure 5. According to our models the four domains behave in distinctive fashion with respect to narrativity, providing support for the idea that narrativity may be another facet underlying Luhmann’s thesis about functional differentiation [16]. Second, with respect to the fact-based domains of nonfiction and science, we also observe meaningful decays in the estimated intensity of narrativity over our time period. For science, we see a drop from an average five-year high of 0.495 estimated narrativity in 1707 to a five-year low of 0.236 in 1994, while for non-fiction we see a less dramatic decline from 0.428 (in 1844) to 0.338 (in 1996). Because our non-fiction class can potentially contain scientific reports archived in Hathi, we cannot definitively tell if this decline of narrativity in non-fiction is due to the growth of science writing in Hathi or the decline of narrativity in non-scientific forms. While we provide some validation of this in the discussion, future work in this direction will depend on more fine-grained genre classification with respect to non-fictionality. In terms of our two “literary” domains, we see little change over time, suggesting relative stability of these domains’ relationship to narrativity. While this does not run counter to expectations with respect to fiction, theorists of poetry might be surprised to see such continuity given the popularity of long narrative poems in the nineteenth century (for example in the work of Walter Scott or Longfellow to name two prominent examples). Future work will want to explore in greater depth whether there is a meaningful break with respect to poetic narrativity for authors born after the late nineteenth-century that then potentially reverses course for younger poets born closer to the end of the twentieth-century as indicated in Figure 5. More 328 domain-specific training data would be needed along with more careful sampling techniques to gain confidence about any such shifts given how slight they are with respect to our models. Taken altogether, our models suggest that narrativity is strongly socially differentiated across different discursive domains and that at least with respect to fact-based discourses this differ- entiation is increasing strongly over time as both non-fiction writing and specifically scientific writing exhibit declines in their reliance on narrative communication. We take up the impli- cations of these findings in our discussion. 5. Discussion Our study raises a number of questions for future research. Representing the social systems of “art” and “science” is a challenging task. In our work, we have tried to capture at least two larger subdomains of writing within these systems to better understand the kind of internal differentiation may be at work. While future work will want to experiment with different samples dependent on different archival resources, we do observe interesting differences with respect to the narrative behavior of our subdomains. For example, we see how the narrativity of poetry is not only considerably lower than prose fiction, it consistently hovers around the fifty-percent mark, suggesting that one of the potential social functions of poetry as a genre could be its ability to communicate narrative ambiguity. Such ambiguity is corroborated by the higher average deviation among our annotators with respect to the poetry training data. This suggests an interesting potential theory one could pursue for the future study of poetry in a larger social context along with the potential increase in narrativity that we observe for poets born after the 1960s. Because our models have been trained to understand cross-domain behavior of narrativity, our work cannot however speak to within-domain distinctions with respect to narrativity. For example, an interesting question to be pursued in the future is the intensity and extent of narrativity at the document level within our different discourses. When do we see novels for instance engage in more explicitly “narrative” communication, are there reliable patterns of the rise and fall of narrativity, or finally what kinds of novels (genres) indicate greater degrees of narrativity overall? Similarly, for science documents while we observe an overall process of denarrativization of scientific documents, are there still portions of articles that engage in more narrative-like behavior or portions of the field (i.e. disciplines) that engage in more narrative communication than others? These questions would help provide insights into the relationship between categories like genre, discipline and narration. Further reflection could also be given to our framework of “truth-based discourse,” which is not exactly synonymous with “science,” which is one of the reasons we also model “non- fictional” writing as well. Scientific writing is one kind of communication that makes truth claims, but there are numerous others that belong to different institutional frameworks. We note that in a random sample of two-hundred passages drawn from our non-fiction experimental data that the number of “scientific” texts moves from 6 in the nineteenth century to 10 in the twentieth. While this represents a large increase, it is still a very small fraction of all writing in our non-fiction sample suggesting that the decline in narrativity in non-fiction is not strongly related to a rise of scientific writing in Hathi Trust. In other words, writing classified as non- fiction is exhibiting similar trends to science but is being produced in different institutional contexts. Future work could explore more deeply why we see this denarrativization of non- fiction along with science writing. 329 At the level of data annotation, while we demonstrate solid agreement between readers and reasonable model correlation with reader judgments, we are not able to annotate large amounts of data to better calibrate our models. Hand-annotation is a slow and expensive process and future work will want to explore mechanisms that allow for scaling annotation while maintaining quality. We assume model accuracy will increase with increased amounts of annotated data, which may or may not have a bearing on the historical trends we observe here. The observed declines in our science and non-fiction corpora are so steep and consistent that we would be surprised if future work indicated significant changes in this regard. In terms of our theoretical framework, it is important to underscore that our approximation of narrativity is just that. While we do not observe significant shifts in the distribution of narrativity according to feature-space selection, our models are still guided by a particular theoretical framework with respect to narrativity. Future work will want to explore alternative theories and feature representations of narrativity to see if the historical trends we are observing continue to emerge. Finally, future work will want to explore the extent to which our findings are or are not culturally specific, i.e. the extent to which they hold in other language communities and the extent to which those correlations are driven by social factors such as national wealth or educa- tion levels. Just how universal is this process of functional differentiation and denarrativization with respect to truth-based discourses that we have observed here? Is this indeed a marker of “modernization”? 6. Conclusion Our work has demonstrated that narrative communication is a detectable linguistic quality of texts from the perspective of human readers and machine learning. Even with a small set of training data we can achieve reasonable levels of predictive accuracy and correlation with trained reader judgments across very different kinds of texts over relatively long historical time spans. We also show that with sufficient training readers can agree quite well regarding the intensity of a passage’s narrativity. Being able to identify the degree of narrativity in large-scale historical document collections allows us to gain a better understanding of the distribution of narrative communication across documents that serve different social functions. Modeling narrativity at the computational level suggests that narrative is a form of communication that participates in Luhmann’s theory of functional differentiation, at least with respect to the social systems of art and science. While culturally and historically universal – narration is present in all linguistic communities and recorded time periods – narration is far from being socially universal. Indeed, it appears that in modern, highly differentiated societies narration is increasingly aligned with the particular social system of the arts as truth-based discourse becomes less and less narrativized over time. How this may impact urgent large-scale questions such as trust in science or particular collective responses to social problems such as climate change or health pandemics remains an open, yet important question for future research. References [1] M. Bal. Narratology: Introduction to the Theory of Narrative. University of Toronto Press, 2009. 330 [2] R. L. Boyd, K. G. Blackburn, and J. W. Pennebaker. “The Narrative Arc: Revealing Core Narrative Structures through Text Analysis”. In: Science Advances 6.32 (2020), eaba2196. [3] J. Bruner. “The Narrative Construction of Reality”. In: Critical Inquiry 18.1 (1991), pp. 1–21. [4] M. J. Burke, L. M. Finkelstein, and M. S. Dusig. “On Average Deviation Indices for Es- timating Interrater Agreement”. In: Organizational Research Methods 2.1 (1999), pp. 49– 68. [5] S. Bushell, G. S. Buisson, M. Workman, and T. Colley. “Strategic Narratives in Climate Change: Towards a unifying narrative to address the action gap on climate change”. In: Energy Research & Social Science 28 (2017), pp. 39–49. [6] N. Chambers and D. Jurafsky. “Unsupervised Learning of Narrative Event Chains”. In: Proceedings of ACL-08: HLT. Columbus, Ohio: Association for Computational Linguis- tics, 2008, pp. 789–797. [7] M. Fludernik. Towards a ‘Natural’ Narratology. Routledge, 2002. [8] G. Genette. “Boundaries of Narrative”. In: New Literary History 8.1 (1976), pp. 1–13. [9] G. Genette. Narrative Discourse: An Essay in Method. Vol. 3. Cornell University Press, 1983. [10] R. Giora and Y. Shen. “Degrees of Narrativity and Strategies of Semantic Reduction”. In: Poetics 22.6 (1994), pp. 447–458. [11] D. Herman. Basic Elements of Narrative. John Wiley & Sons, 2009. [12] H. Kermes, S. Degaetano-Ortlieb, A. Khamis, J. Knappen, and E. Teich. “The Royal Society Corpus: From Uncharted Data to Corpus”. In: Proceedings of the Tenth Interna- tional Conference on Language Resources and Evaluation (LREC’16). 2016, pp. 1928– 1931. [13] E. Kubin, C. Puryear, C. Schein, and K. Gray. “Personal Experiences Bridge Moral and Political Divides Better than Facts”. In: Proceedings of the National Academy of Sciences 118.6 (2021). [14] W. Labov and J. Waletzky. “Narrative Analysis: Oral Versions of Personal Experience.” In: (1967). [15] N. Luhmann. Die Kunst der Gesellschaft. Suhrkamp, 1995. [16] N. Luhmann. Social Systems. Stanford University Press, 1995. [17] N. Mostafazadeh, N. Chambers, X. He, D. Parikh, D. Batra, L. Vanderwende, P. Kohli, and J. Allen. “A Corpus and Cloze Evaluation for Deeper Understanding of Common- sense Stories”. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2016, pp. 839–849. [18] N. Mostafazadeh, A. Grealish, N. Chambers, J. Allen, and L. Vanderwende. “CaTeRS: Causal and Temporal Relation Scheme for Semantic Annotation of Event Structures”. In: Proceedings of the Fourth Workshop on Events. San Diego, California: Association for Computational Linguistics, 2016, pp. 51–61. doi: 10 . 18653 / v1 / W16 - 1007. url: https://www.aclweb.org/anthology/W16-1007. 331 [19] E. Ochs and L. Capps. Living Narrative: Creating Lives in Everyday Storytelling. Harvard University Press, 2009. [20] J. Ouyang and K. McKeown. “Modeling Reportable Events as Turning Points in Narra- tive”. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Lisbon, Portugal: Association for Computational Linguistics, 2015, pp. 2149– 2158. doi: 10.18653/v1/D15-1257. url: https://www.aclweb.org/anthology/D15-1257. [21] P. Papalampidi, F. Keller, and M. Lapata. “Movie Plot Analysis via Turning Point Iden- tification”. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Hong Kong, China: Association for Computational Lin- guistics, 2019, pp. 1707–1717. doi: 10.18653/v1/D19-1180. url: https://www.aclweb. org/anthology/D19-1180. [22] F. Pianzola. “Looking at Narrative as a Complex System: The Proteus Principle”. In: Narrating Complexity. Springer, 2018, pp. 101–122. [23] A. Piper and S. Bagga. HATHI 1M: Million Page Historical Prose Data in English from the Hathi Trust. 2021. [24] P. Ricoeur. Time and Narrative, Volume 1. University of Chicago press, 2012. [25] R. J. Shiller. Narrative Economics: How Stories Go Viral and Drive Major Economic Events. Princeton University Press, 2020. [26] M. Sternberg. “Telling in Time (I): Chronology and Narrative Theory”. In: Poetics Today 11.4 (1990), pp. 901–948. [27] M. Sternberg. “Telling in Time (II): Chronology, Teleology, Narrativity”. In: Poetics Today 13.3 (1992), pp. 463–541. [28] T. Underwood, P. Kimutis, and J. Witte. “NovelTM Datasets for English-Language Fiction, 1700-2009”. In: Journal of Cultural Analytics 5.2 (May 28, 2020). doi: 10.22148/ 001c.13147. [29] H. White. “The Value of Narrativity in the Representation of Reality”. In: Critical Inquiry 7.1 (1980), pp. 5–27. [30] S. Zeman. “Grammatik der Narration”. In: Zeitschrift für germanistische Linguistik 48.3 (2020), pp. 457–494. 332