Cultural Accumulation and Improvement in Online Fan Fiction Federico Pianzolaa,b , Alberto Acerbic and Simone Reborad,e a University of Milan-Bicocca, Piazza dell’Ateneo nuovo 1, 20126 Milan, Italy b Sogang University, 35 Baekbeom-ro, Daeheung-dong, Mapo-gu, Seoul, South Korea c Brunel University London, Uxbridge, UB8 3PH, United Kingdom d University of Verona, Lungadige Porta Vittoria 41, 37129 Verona, Italy e University of Basel, Petersplatz 1, 4051 Basel, Switzerland Abstract We analyse stories in Harry Potter fan fiction published on Archive of Our Own (AO3), using concepts from cultural evolution. In particular, we focus on cumulative cultural evolution, that is, the idea that cultural systems improve with time, drawing on previous innovations. In this study we examine two features of cumulative culture: accumulation and improvement. First, we show that stories in Harry Potter’s fan fiction accumulate cultural traits—unique tags, in our analysis—through time, both globally and at the level of single stories. Second, more recent stories are also liked more by readers than earlier stories. Our research illustrates the potential of the combination of cultural evolution theory and digital literary studies, and it paves the way for the study of the effects of online digital media on cultural cumulation. Keywords cultural evolution, cumulative culture, Harry Potter, digital literary studies, fan fiction, literature 1. Introduction In many cultural domains we can observe a progress through cumulative improvements. The efficiency of information storage, just to take a familiar example, increased through centuries with a series of innovations, in a process that continues today: a contemporary smartphone can store thousands of books. In cultural evolution theory this process is broadly described as cumulative cultural evolution [9, 11]. There are no universally agreed measures of cumulation [11], but it seems a sensible assump- tion that its degree differs in different cultural domains. In some domains, such as technology, we can observe clear marks of accumulation, while in others, such as arts, even if accumulation is not absent [19, 16], its scope appears limited. Many factors could explain why the degree of accumulation differs as such, but two are particularly interesting for cultural evolution. One is availability, that is, the number of possible models or cultural traits one has at disposal. While the exact details are debated [8, 20], the basic idea is that more versions of the same item available to copy from increase the probability that the item will be successfully pre- served. The second is fidelity. Cultural transmission is a noisy process. Transmission chain CHR 2020: Workshop on Computational Humanities Research, November 18–20, 2020, Amsterdam, The Netherlands £ federico.pianzola@unimib.it (F. Pianzola); alberto.acerbi@brunel.ac.uk (A. Acerbi); simone.rebora@univr.it (S. Rebora) DZ 0000-0001-6634-121X (F. Pianzola); 0000-0001-5827-8003 (A. Acerbi); 0000-0002-1501-3774 (S. Rebora) © 2020 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) 2 experiments show that information transmitted orally tend to be lost and transformed [18]. In some cases, populations of individuals converge on similar cultural traits without the need of accurate passage of information, because of various constraints [5, 13]. In other cases, however, information needs to be preserved and successfully transmitted to allow the reproduction of a cultural trait [2]: modern complex technologies are preserved by virtue of supports that allows faithful transmission. The contemporary diffusion of online digital media produces an increase of availability and fidelity in several domains [1]. A pertinent question is then what the consequences for cultural cumulation are; whether this results in more improvement, or perhaps more improvements in domains where it was limited before, because of relative lack of availability and fidelity. To start answering this question we analyse cultural cumulation in AO3, the biggest fan fiction archive for stories in English. Previous research on fanfiction has found, for example, that authors improve their writing skills in time, widening the range of their vocabularies [3]. Here we focused on two features of cultural cumulation explicitly derived from cultural evolution theory: accumulation and improvement. Accumulation refers to an increase in the number of different cultural traits, while improvement refers to the fact that more recent traits are ”better”, according to some metrics. (The third feature of cumulative culture discussed in [1], ratcheting, that is, the fact the new innovations draw on previous innovations, is not analysed here). Literary studies have theorized the historical increase in complexity (accumulation) of lit- erature, for instance, with the introduction of formal innovations by literary modernism (e.g. stream of consciousness) and postmodernism (e.g. self-reflexivity) [10]. More problematic is the case of improvement. The quality of literary works has been traditionally judged by a restricted group of people, publishers and literary critics, who decided which texts deserved to be included in the literary canon, often based on a criterion of originality, i.e. the presence of some rhetorical or stylistic innovation in linguistic expression. This kind of institutionalised prestige has often been opposed to the popularity of bestselling fiction [15], appreciated by many because of its serial replication of known plot schemes or themes. More broadly, the issue concerns the contrast between a narrow selection of canonized works and the entirety of the archived literary production [12]. Fanfiction is an example of popular literature, pro- duced in a context in which social exchange, collaborative work, and fidelity to the canon or to fandom tropes is the norm. In contrast, professional authors have often worked alone and pursuing originality in the history of literature. In this context, we think it is appropriate to talk about improvement in terms of cumulative culture, which focuses on the reception of cultural artefacts and their widespread popularity as a measure of improvement [11]. On the other hand, it might be the case that our model is not directly applicable to canonical literature because power relationships and prestige influence that cultural field more strongly than they do with fanfiction. 2. Methods We chose to work with AO3 data because they are freely accessible, scraping of the website is allowed and supported by the developers maintaining the servers, and there is an excellent metadata system based on organized tags (Table 1). In addition, when uploading a story on AO3, authors can add various kinds of tags, among which: fandom tag, to signal the fictional universe/es to which the story is related (e.g. ”Harry Potter - J. K. Rowling”); character tag, 3 Table 1 AO3 stories’ metadata Metadata Description Title title of the story Author author of the story URL unique identifier of the story, assigned when the draft is created Date date of last update of the story Summary summary of the story, if written by the author Language language of the story Words number of words of the story Chapters number of chapters written and total number of chapters, if available (e.g. 1/1 or 15/?) Completion status of the story, either ”completed” or ”in progress” Kudos number of times someone liked the story Bookmarks number of times the story has been bookmarked Hits number of times the story has been viewed Comments number of comments left at the end of the story Tag label inserted by the author Tag type one of 7 categories: fandom, character, relationship, freeform, rating, archive warning, relationship orientation to list the characters appearing in the story (”Hermione Granger”); relationship tag, to list the relationship/s in which the story characters participate (e.g. ”Draco Malfoy/Harry Potter”); freeform tag, which can be related to any other aspect of the story (e.g. ”POV Draco”), the fandom (e.g. ”Community: daily_deviant”), or fanfiction writing’s conventions (e.g. ”Ron Weasley bashing”). AO3 has a system of tag wrangling, i.e. volunteers that continuously monitor newly intro- duced tags aggregating them with existing ones—without replacing them—when they refer to the same characters/relationships/themes. In our analysis we relied on this tag aggregation, which we implemented thanks to the creation of a linked-data knowledge base mapping all ”synonym” tags used [14]. We collected all metadata of the stories tagged with the fandom tag ”Harry Potter - J. K. Rowling” (217,772 stories), including all tags and information listed in Table 1. We excluded stories not in the English language, stories with less than 10 words, and stories published in 2020, obtaining a final sample of N = 196,726 stories. Fans started uploading their stories on AO3 since November 2009, when the archive became publicly accessible. However, some of the stories in AO3 are imported from previous publications or have been written in the past. In such cases, authors can backdate their stories, indicating a year earlier than the one of upload on AO3. Since we are using AO3 as a data source to study cultural accumulation in fan fiction, we want to consider the backdated year as the original date of these stories. With this procedure, the first year with more than 100 stories is the year 2002 (324 stories). We fixed this minimum threshold in order to group the stories in percentiles (see below). Overall, it should be kept in mind that data from earlier than 2010 are less representative of the Harry Potter fan fiction compared to later years, since only a portion of the stories published elsewhere has been imported. Moreover, this date adjustment can be used only for the accumulation analysis, but not for the improvement analysis, since hits and kudos started to accumulate only from the date of publication on AO3. 2.1. Accumulation In order to have accumulation, the number of cultural traits at time t+1 should be higher than the number of cultural traits at time t. The hypothesis we test in our dataset is that 4 30000 20000 Number of stories 10000 0 2002 2006 2010 2014 2018 Year Figure 1: Number of stories in the AO3 archive we considered in the analysis. the number of unique tags increases in time. We first test this hypothesis on the overall number of different tags, i.e. if the overall Harry Potter fandom accumulates traits (more characters, more relationships, more themes). Since the overall accumulation is associated to the parallel increase in the total number of stories (see Figure 1), we also check the average number of unique tags per story, i.e. if single stories accumulate traits (more characters, more relationships, more themes per story). For the latter analysis, we confront stories with a similar popularity (defined by the number of kudos received) relative to the year in which they are published. There are two reasons for doing this. First, our cumulative culture hypothesis does not predict necessarily that all stories should increase the number of traits, only that the ”best” recent stories should have more traits than the ”best” earlier stories. Given the increase in the total number of stories, it could be that this effect is hidden when averaging on the total number of stories. Second, adding more tags can be a way to make a story more discoverable by readers with different interests and consequently increase the number of hits and the probability to receive kudos, so it makes sense to compare stories with similar popularity through years. To do this, we group the stories of each year in different percentiles according to the number of kudos and analyse the trend of the number of unique tags for the first, middle, and last decile. 2.2. Improvement There is improvement when the cultural traits at time t+1 are ”better” (more effective accord- ing to some measure) than the cultural trait at time t. The hypothesis we test in our dataset is that the appreciation of stories increases in time, i.e. stories receive more hits and kudos. As we did for accumulation, we analyse both the global trend and the trend for single stories. A further issue concerns the choice of a reliable measure for stories’ appreciation. For the analysis of global trend, we use three measures. The absolute number of kudos is simply the total sum of kudos received. This measure may favour very popular but less appreciated, 5 characters freeforms relationships 12500 60000 10000 9000 7500 40000 6000 Count 5000 20000 3000 2500 0 0 0 2002 2006 2010 2014 2018 2002 2006 2010 2014 2018 2002 2006 2010 2014 2018 Year Figure 2: Number of unique tags per year, divided in “characters”, “freeforms” and “relationships”. in proportion, stories, and also favour older stories that have had more time to accumulate kudos. We thus use an additional second measure: the kudos/hits ratio. The ratio accounts for the age of a story, but it may favour ”niche” stories, with very few hits and kudos. For this reasons, we finally consider a weighted measure, an engagement score S computed as the true Bayesian average [4, 6] of the kudos/hits ratio, calculated as: S = wK + (1 − w) ∗ Kav where K is the kudos/hits ratio of the story, Kav is the average of the ratio in a certain year, and w is a weighting parameter, calculated as: H w= H + Hav where H is the number of hits for a certain story, and Hav is the average number of hits per story for a certain year. For single stories, we only compute the engagement score S, for the first, middle, and last decile. 3. Results 3.1. Accumulation Figure 2 shows that there is accumulation of the overall number of unique tags. We can also visualise how for each character, or group of characters, the number of freeforms and relationships tags increase, meaning that there is more diversity in stories. Figure 3 is an example for relationships that involve the three most popular characters. It shows the relative proportion (on the total number of stories where the characters’ relationships are tagged) of relationships tags for the three characters Harry Potter (green palette), Draco Malfoy (purple palette), and Hermione Granger (orange palette). It can be seen that diversity increases in time, since new relationships are introduced every year. 6 1.00 0.75 Proportion of stories 0.50 0.25 0.00 2002 2006 2010 2014 2018 Year Figure 3: Variety and relative proportion of relationships tags for Harry Potter (green palette), Draco Malfoy (purple palette), and Hermione Granger (orange palette). Characters Freeforms Relationships 10.0 3.0 15 7.5 2.5 Average count Rank 10 High kudos (91−100th percentile) 2.0 Mid kudos (46−55th percentile) 5.0 Low kudos (1−10th percentile) 5 1.5 2.5 1.0 2002 2006 2010 2014 2018 2002 2006 2010 2014 2018 2002 2006 2010 2014 2018 Year Figure 4: Yearly average number of character, relationship and freeform tags per story. Stories with the same popularity/quality (measured by kudos in percentiles) are compared. Looking at the average number of unique tags per story, Figure 4 shows that they also increase in time. This is true for stories with all levels of popularity/quality, as can be seen from the accumulation slopes of Figure 4, which are always positive from 2002 to 2019 for all three samples (low, mid, and high percentiles). However, the best stories (higher percentiles) have a faster pace of accumulation, as predicted by the cumulative culture hypothesis. 3.2. Improvement Figure 5 shows the increase in time of popularity/quality in the stories. The overall number of kudos (central panel) decreases for recent stories. This highlights that older stories have had more time to accumulate kudos, and it is a widespread phenomenon, e.g. for citations of scientific articles [17]. When considering the ratio kudos/hits (left panel) or the weighted engagement score (right panel), readers’ engagement increases in time. This is also true for single stories, for all three percentiles considered in our analysis (Figure 6). 7 ratio_kudos_hits sum_kudos weighted 0.60 ● 0.06 ● ● ● ● ● ● 4e+06 ● ● 0.55 ● ● ● ● 0.05 ● ● ● ● 2e+06 ● ● ● ● 0.50 ● ● ● ● 0.04 ● ● ● ● ● 0e+00 0.45 2010 2012 2014 2016 2018 2010 2012 2014 2016 2018 2010 2012 2014 2016 2018 Year Figure 5: Yearly values of popularity/quality measured using the three different techniques described in the text. 0.7 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.6 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Average weighted scores ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Rank ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● High score (91−100th percentile) ● ● ● ● ● ● ● ● ● ● ● 0.5 ● ● ● ● ● ● ● ● ● Mid score (46−55th percentile) ● ● ● ● ● ● ● ● ● ● ● ● Low score (1−10th percentile) ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.4 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 2010 2012 2014 2016 2018 Year Figure 6: Yearly engagement score (S) comparing stories with the same popularity/quality (measured in percentiles). 4. Discussion The results of our work in progress show that we can observe two features of cumulative culture, namely accumulation and improvement, in online fan fiction. Interestingly, the two features can be detected both at the global level, i.e. considering all stories, and when analysing single stories. While the former observation is expected, as matched by the parallel increase in the number of stories, the latter is more surprising, suggesting that individual stories became, through years, more complex and more appreciated by readers. In future works, to provide a complete picture of cumulative culture in this domain, we plan to also analyse the ”ratcheting” feature, that is, the fact that novel innovations draw on past ones. In our case, this would imply to detect, for example, that new introduced tags tend to appear in correspondence of specific previous tags, e.g. ”Alternate Universe - Modern Setting” with ”Alternate Universe - Canon Divergence”. 8 In this research, and in the planned research on ratcheting, we focused on tags. The analysis of tags’ frequency can be combined with techniques like text re-use detection and stylometry, to check whether older stories contain sentences that are ”copied” or adapted in more recent stories (word frequency patterns), or to check text similarities concerning themes (topic modelling). Similarly, accumulation, or an increase in complexity, could be detected at the more fine- grained level of textual properties. We are not aware of similar research for fiction published by institutionalized authors and publishers, but it would be interesting to compare cumulative cultural evolution in two similar domains, one which is supported by digital online media and one which is not. As discussed in the Introduction, an important research question is whether particular features of online digital media—increased fidelity and availability—support cultural accumulation. Another aspect that would be interesting to take into account is the effect on fanfiction of the publication of further instalments of the original work (e.g. new books or films). An effect has been documented for fanfiction related to various fandoms, but not all of them [7]. For Harry Potter, for example, the release of the spin-off ”Fantastic Beasts and Where to Find Them” in November 2016 increased the number of published stories in 2016 and 2017. However, on AO3 there was no such effect with the release of the last two films of the original series ”Harry Potter and the Deathly Hallows” part 1 and part 2, in November 2010 and July 2011 respectively, because at the time a lot of Harry Potter fanfiction was being posted on Fanfiction.net, which indeed saw a huge increase in posting in those months [7]. More generally, we believe cultural evolution provides a fruitful theoretical background for digital literary studies, introducing a new range of testable hypotheses that can elucidate some of the dynamics studied by digital humanities. On the other hand, digital texts (and the sophisticated methodologies developed in literary studies and Natural Language Processing to analyse them) are an interesting and increasingly data-rich domain for cultural evolutionists. We hope our research will contribute to the further exploration of this interdisciplinary space. Acknowledgments Thanks to fffinnagain, destinationtoast, Shay Guy, and the other fans who publicly shared their statistics about fanfiction. References [1] A. Acerbi. Cultural Evolution in the Digital Age. Oxford, New York: Oxford University Press, Dec. 2019. isbn: 978-0-19-883594-3. [2] A. Acerbi and A. Mesoudi. “If we are all cultural Darwinians what’s the fuss about? Clarifying recent disagreements in the field of cultural evolution”. en. In: Biology & Philosophy 30.4 (2015), pp. 481–503. issn: 1572-8404. doi: 10.1007/s10539-015-9490-2. (Visited on 11/21/2019). [3] C. Aragon, K. Davis, and C. Fiesler. Writers in the Secret Garden: Fanfiction, Youth, and New Forms of Mentoring. English. Cambridge: MIT Press, 2019. isbn: 978-0-262- 53780-3. 9 [4] J. Balraj and C. Farook. “Enhance Rating Algorithm for Restaurants”. en. In: Advances in Information and Communication. Ed. by K. Arai and R. Bhatia. Cham: Springer, 2020, pp. 224–234. isbn: 978-3-030-12384-0 978-3-030-12385-7. doi: 10.1007/978-3-030- 12385-7_18. (Visited on 07/16/2020). [5] P. Boyer. “Cognitive Tracks of Cultural Inheritance: How Evolved Intuitive Ontology Governs Cultural Transmission”. In: American Anthropologist 100.4 (1999), pp. 876–889. [6] ebc. How to Rank (Restaurants). en-US. Jan. 2015. url: http://www.ebc.cat/2015/01 /05/how-to-rank-restaurants/ (visited on 07/16/2020). [7] S. Guy. FANFICTION.NET: FANDOMS OVER TIME. Feb. 2015. url: https://toas tystats.tumblr.com/post/111930409603/fanfictionnet- fandoms- over- time- toasty- says (visited on 09/09/2020). [8] J. Henrich. “Demography and Cultural Evolution: How Adaptive Cultural Processes can Produce Maladaptive Losses: The Tasmanian Case”. In: American Antiquity 69.2 (2004), pp. 197–214. issn: 0002-7316. doi: 10.2307/4128416. (Visited on 12/02/2019). [9] J. Henrich. The Secret of Our Success: How Culture Is Driving Human Evolution, Domes- ticating Our Species, and Making Us Smarter. English. Princeton & Oxford: Princeton University Press, Oct. 2015. [10] B. McHale. Postmodernist Fiction. en. London: Methuen, 1987. isbn: 978-0-203-39332-1. [11] A. Mesoudi and A. Thornton. “What is cumulative cultural evolution?” In: Proceedings of the Royal Society B: Biological Sciences 285.1880 (June 2018), p. 20180712. doi: 10.1098/rspb.2018.0712. (Visited on 11/22/2019). [12] F. Moretti. “The Slaughterhouse of Literature”. en. In: Modern Language Quarterly 61.1 (Mar. 2000), pp. 207–228. issn: 0026-7929, 1527-1943. doi: 10.1215/00267929-61-1-207. (Visited on 09/08/2020). [13] O. Morin. How Traditions Live and Die. English. London & New York: Oxford University Press, Nov. 2015. isbn: 978-0-19-021050-2. [14] F. Pianzola. fedormyskin/Linked-Potter. en. 2020. url: https://github.com/fedormyski n/Linked-Potter (visited on 09/09/2020). [15] J. Porter. Popularity/Prestige. Tech. rep. 17. Stanford Literary Lab, Sept. 2018. url: https://litlab.stanford.edu/LiteraryLabPamphlet17.pdf (visited on 09/29/2018). [16] O. Sobchuk and P. Tinits. “Cultural Attraction in Film Evolution: the Case of Anachronies”. en. In: Journal of Cognition and Culture 20.3-4 (Aug. 2020), pp. 218–237. issn: 1568-5373, 1567-7095. doi: 10.1163/15685373-12340082. (Visited on 09/08/2020). [17] A. G. Stacey. “Robust parameterisation of ages of references in published research”. en. In: Journal of Informetrics 14.3 (Aug. 2020). issn: 17511577. doi: 10.1016/j.joi.2020.10 1048. (Visited on 06/26/2020). [18] J. M. Stubbersfield, E. G. Flynn, and J. J. Tehrani. “Cognitive Evolution and the Trans- mission of Popular Narratives: A Literature Review and Application to Urban Legends”. In: Evolutionary Studies in Imaginative Culture 1.1 (2017), pp. 121–136. issn: 24729884, 24729876. doi: 10.26613/esic.1.1.20. [19] P. Tinits and O. Sobchuk. “Open-ended cumulative cultural evolution of Hollywood film crews”. en. In: Evolutionary Human Sciences 2 (2020). issn: 2513-843X. doi: 10.1017/e hs.2020.21. (Visited on 07/09/2020). 10 [20] K. Vaesen et al. “Population size does not explain past changes in cultural complexity”. eng. In: Proceedings of the National Academy of Sciences of the United States of America 113.16 (Apr. 2016), E2241–2247. issn: 1091-6490. doi: 10.1073/pnas.1520288113. 11