1. Introduction

Fiction”. In:Journal of Cultural Analytics

10.22148/16.019

How Exactly does Literary Content Depend on Genre? A Case Study of Animals in Children's Literature

0 INALCO , Paris 1 Institute of Russian Literature (Pushkin House) , Saint Petersburg

2023

2 2018 0000 0002

The content of literary 昀椀ction at least partly depends on literary tradition. The dependence is attested quantitatively in the association of genre with lexical statistical patterns. This short paper is a step to formal modeling of the content-moderating processes associated with literary genres. The idea is to explain prevalence of the particular lemmas in a literary text by the genre-dependent accessibility of the semantic category during the creative process. Data on animals mentioned in various sub-genres in a corpus of Russian children's literature is used as an empirical case. Vocabulary growth models are applied to infer genre-related di昀erences in overall diversity of animal vocabularies. A constrained topic model is employed to infer preferences for particular animal lemmas displayed by various genres. Results demonstrate the models' potential to infer genre-related content preferences in the context of high variance and data imbalance.

eol>computational thematics genre vocabulary growth model children's literature

1. Introduction

Computational methods made content of literary 昀椀ction a practical target for systematic exploration. All kinds of phenomena are being counted in corpora of 昀椀ctional texts, including natural and material objects, body parts, emotions etc. with inferences for either literary or cultural history5[ , 18, 12 ]. This body of work could bene昀椀t from a more explicit recognition of the dual source of the literary content: literary tradition (internal) and social reality mediated by the author’s experience (external)1[ 9 ]. It is crucial for inferences on cultural dynamics with literary data to acknowledge and to measure the in昀氀uence of literary tradition on 昀椀ctional content.

There is evidence that some aspects of literary tradition captured by a vague notion of genre leave a discernible signal in the distribution of content words. For instance, predictive models using only frequent lexical features are able to discriminate between literary genres with decent accuracy [ 17, 11, 14, 3, 15 ]. Such models could be reverse-engineered to look for the most informative lexical features. Yet interpreting these features without understanding the mechanics of how genre-related constraints on content translate into lexical distribution phenomena is a risky business. Brie昀氀y, quantitative studies of literary content are in need of a more formalized theory.

This short paper is a step to formal modeling of the content-moderating processes associated with literary genres. To reduce complexity, I focus on a narrow subject — animals in children’s literature. The presence of animals in books for children is evidently supported by literary tradition, not only as characters, but more generally as pedagogical mater4i,a1l3][. It is also reasonable to expect variation by sub-genre in the prominence and selection of animals, compare e.g. fairy tales and teen detective stories. Hence I start from the premise that in this case the in昀氀uence of literary tradition is considerable, is associated with genre, and thus could be measured. The goal is to devise the most simple yet justi昀椀able generative models that represent the content of literary work as a result of choices made during the creative process conditional on genre.

The models suggested in this paper are employed to make two types of measurement in a corpus of Russian literature for children and young adults. First, a quantitative estimation of the e昀ect of sub-genre on the number of various animals mentioned in a text (animal diversity). I suggest to base this estimate on a vocabulary growth model to e昀ectively control for the highly variable text length. Second, an estimation of the preferences to mention certain animal species in each sub-genre. This task is attained with the help of a specialized topic model.

2. Models

To quantify the relative weight of literary tradition vis-a-vis external factors in the prevalence of a concept one needs to put heterogeneous internal and external in昀氀uences on a uni昀椀ed numeric scale. To this end I suggest to employ a notion of cognitiavcecessibility of a concept to the author in the process of writing. Accessibility can be operationalized as a probability that a concept will be mentioned at least once in a certain work given all the predictive factors. Accessibility of a concept during the creative process is not directly observable, at least not for past literature. But it can be measuread posteriori at the population level by observing lexical frequency. Then all the literary and social factors can be seen as distal causes that exert their in昀氀uence on literary content through increased or decreased accessibility of some concepts. Accessibility provides a convenient conceptual basis for the following models since it o昀ers both a generative interpretation and a measurement scale for the data on lexical prevalence.

2.1. Vocabulary growth model

The 昀椀rst objective is to estimate relative accessibility of animals in general in various subgenres. The task is complicated by the fact that the distribution of text lengths varies widely between genres. From a modeling perspective the task is to predict the length of a list of di昀erent animals mentioned in the text. Since such a list is technically just a part of the text’s vocabulary, a general vocabulary growth model can be applied to it. The most basic model that relates vocabulary size to the text length is known as Heaps’ law7].[ It re昀氀ects the fact that vocabulary size in a text in natural language is unbounded, but the growth rate diminishes with text length. So far as the model is targeted only at a share of the total vocabulary, I slightly modify the interpretation of coe昀케cients in the Heaps’ formula = (1) where is the number of animals mentioned (vocabulary siz e), is text length in tokens, (typically0.4 ≤ ≥ 0.6 ) and — coe昀케cients that control the growth rate. This model allows to account for the length of a text in a principled way.

In the experiments below I explore two ways to incorporate the e昀ect of genre into this model. The most obvious move is to allow either the coe昀케cient or the exponent to vary by genre. Higher values of the coe昀케cients would indicate higher accessibility of a lexical category in the genre. Evidently, genre (as a proxy to literary tradition) is only partly responsible for the lexical content of the work, and much genre-internal variation remains to be explained by other factors. The external factors that span all the aspects of socialization and linguistic experience of the author can be accounted for by letting eithe ror to vary by author. However this solution entails the assumption that authors employ animal vocabulary to a similar degree in all their texts.

The alternative view is to (simplistically) assume that the observed list of animal mentions comes from either of two processes: (a) a low-intensity background process when the number of animals mentioned grows only slowly with the text length; (b) a high-intensity foreground process, leading to a higher number of animal mentions for the text of a similar length. In other words, animals may or may not be a relevantotpic for a given text. Then each genre could be represented as a mixture of texts each one coming from one of the two processes. The genres would di昀er in the latter model by the estimated share of texts produced by background and foreground processes. Formally, = 1 + (1 − ) 2 (2) where is a genre-speci昀椀c share of background process , 1 and 2 stand for the intensity of a background and foreground processes, respectively.

For details on formal de昀椀nition of the statistical models used for the experiments, priors, and model selection see appendixA.

2.2. Genre-topic model

The second objective is to estimate the relative accessibility of certain animal species in various sub-genres. For this case, the list of di昀erent animals mentioned in a text is treated as a document. The theoretical assumption is that items in this list are drawn from two sources: in昀氀uence of the literary tradition (genre) on the one side, and the external author’s experience on the other. A topic model is the most common formal generative model to describe the composition of a word list drawn from various sources. An advantage of a topic model is that it implicitly accounts for the text length.

The two sources assumption can be translated into a highly constrained topic model where each document is composed of just two topics, one topic speci昀椀c for the genre of the text, and another topic to model external in昀氀uences. As a result, each genre has its own “topic”. Probabilities of words in these topics re昀氀ect preference (higher accessibility) for an animal associated with a particular genre. While genre-speci昀椀c topics are meant to capture the in昀氀uence of literary tradition, for simplicity and to make estimation possible I reduce all external factors to a single common topic.

The generative story for this model runs as follows. 1. For each document :

Draw a proportion of genre-speci昀椀c topic for this document. 2. For each word in a document: a) With probability draw a word from a genre-speci昀椀c topic | .

b) With probability1 − draw a word from a general top i|c .

The model has two hyperparameters: a prior for genre-topic proportion in a docu m en∼t Beta(3, 3) and a Dirichlet prior for distribution of probability of words in each t(o|)pi∼c Dirichle(t 1, ..., ), = 0.8.

Such a model can be seen as a highly constrained variant of a well-known LDA topic model [1]. Unlike LDA, the topical composition of a document is not a parameter to be estimated by the model, but is always a mix of two topics pre-de昀椀ned by the genre of a text. The only document-level parameter the model is le昀琀 to estimate is a proportion of genre topic. The probabilities of words in a topic are estimated in the same way as in LDA.

3. Data and measurements

The data for the analysis come from the Detcorpus, a corpus of Russian prose for children and young adults written between 1900 and 20209[]. All the texts in the corpus are provided with a list of genre tags as a part of their metadata. The genres considered in the present analysis do not form a neat typology. The major genres that span the whole corpus include fairy tale, science 昀椀ction, and realism, the last one generic, standing for all prose without speci昀椀c genre attributes. The other group is formed by formulaic genres that appeared on the market since 1990s: detective stories, fantasy, horror, and romance books for teens. Animal stories, a wellrecognizable sub-genre of prose for children is included as a separate category due to a speci昀椀c focus on animals. For each work, genre tags were reduced to one single label from the above list. Genre labels are regarded as a proxy to those aspects of literary tradition that supposedly have a su昀케ciently strong and stable e昀ect to be detected in the distribution of animal mentions. In total, the data comprises 2994 works ranging from 100 to 300000 words in length. See details on the data composition, genre and author distribution in the appendCix.

To identify the occurrences of animals in texts I constructed a dictionary using all Russian names and aliases for animal taxa in Wikidata. In contrast to previous work that aimed to measure biodiversity in literatur6e,[ 10 ], taxa names are not reduced to nouns, and when a taxon name is a multi-word expression it is matched as a sequence of lemmas. Dictionary-based methods are notoriously plagued by false positive matches due to homonymy. The problem is quite severe with animal names, as metaphor is heavily used as a semantic device in this lexical category. To achieve a satisfactory precision, I manually compiled an extensive stoplist (405 items) of the lemmas that are less likely to refer to animianlsthis particular corpus. As a result, of 20811 lemmas in the dictionary 1906 were matched in the corpus. The accuracy of the method was evaluated on a sample of 50 random 500-word excerpts (precision 0.97, recall 0.81, F1 0.88). Evaluation indicated that Wikidata systematically underrepresents female forms of animal names, names for cubs, and various derivative forms, especially diminutives.

For modeling, each work is reduced to a list of all distinct animal lemmas mentioned in the text (each lemma is present only once). Since the focus of the analysis is on lexical phenomena, I chose to count lemmas, not species. It should be recognized that a relationship of animal nominations to biological taxa can be highly ambiguous, and identi昀椀cation of the speci昀椀c taxon meant in a text presents a separate problem. For a genre-topic model all the works by the same author in the certain genre are joined into a single document to avoid bias induced by better represented authors in the corpus. To simplify topic inference, only the lemmas that occurred in 5 works or more were retained. All statistical inference was Bayesian and performed with the help of STAN Hamiltonian Monte Carlo sampler. Se8e][for the data and code used for the analysis.

4. Results

In the 昀椀rst experiment vocabulary growth models de昀椀ned above were employed to quantify genre di昀erences in the expected number of animals mentioned (diversity of animal vocabulary). The results are displayed in the 昀椀g1.. Le昀琀 panel shows the animal vocabulary growth rate parameter as estimated by the model that assumes that in the Heaps’ formula varies by author and varies by genre. Right panel displays the percenta ge of animal-rich texts for each genre as estimated by a mixture model. Growth rate paramete rsand in the mixture model are 昀椀xed for rich and small animal vocabulary clusters and do not depend on author or genre.

Both models infer similar genre di昀erences. Animal stories and, to a lesser degree, genres with fantastic element (fairy tales, fantasy, sci-昀椀) have larger animal vocabularies on average (or, alternatively, larger share of texts with rich animal vocabularies) in comparison to realism as a reference point. A slightly more surprising conclusion is that formulaic genres (detective, horror) use narrower animal vocabulary with the lowest result attained by teen romance novels.

The vocabulary growth rate model indicates that more variance in the animal vocabulary size is associated with authors than with genres. The model predicts that in a typical novella (50,000 words) an author with an average interest in animals will mention 10 more animal lemmas, on average, in an animal story, 3 more in a fairy tale, and 6 less in romance, all in comparison with realism. Simultaneously, the predicted di昀erence between the author with the highest interest in animals (Nikolai Sladkov) with one of the lowest (Anatoly Aleksin) for a realist novella of the same length would be 197 animal lemmas, on average.

Since the mixture model estimates a probability that each text belongs to either rich or small animal vocabulary group it could be seen as a model-based clustering of texts. This allows for 昀椀ner comparison of otherwise similar works that di昀er by the density of animal mentions. Many authors consistently appear in one of the clusters, for instance, Vitaliy Bianki, a canonical author of animal stories, is invariably a high-scorer. But even the texts of the same genre, size and by the same author may fall in di昀erent groups. Short stories from the same book by Andrei Platonov classi昀椀ed as fairy tales provide a vivid example. One is “Why did the geese become motley”, 626 words, 1 animal species (geese), low-animal cluster. The second is “A grateful hare”, 643 words, 11 species, high-animal cluster. In the second story, a hare helps the protagonist by calling other animals to bring foods, which e昀ectively generates an enumeration of species.

For the second experiment a genre-topic model was trained. The parameters estimated by the model include the probabilities for each lemma in each genre (genre-speci昀椀c topics) and a background probability of each lemma. The probability distribution of lemmas in the background topic is very close to the overall frequency distribution of lemmas in the data (JensenShannon divergence0.02). High probability of a lemma in a genre topic means that this animal is likely to be mentioned by larger number of authors writing in this genre (is more accessible given the genre). The summary of the genre topics is presented in 昀椀g.2. For generality, instead of presenting lemmas I group animals in larger categories and provide a number of lemmas in each category for a top-20 lemmas in a topic. Top lists for each topic are built using a balanced FREX metric which combines probability of a lemma in a topic with exclusivity of a lemma to this topic in contrast to other topics.

There are a few notable tendencies made apparent thanks to the genre-topic model. Animal stories are distinctive for its focus on forest animals and the most diverse set of bird species, primarily wildfowl. This may be contextualized with a note that the most proli昀椀c authors in this genre (e. g. Bianki, Prishvin) were passionate hunters. Birds also have a prominent place in other genres, including fairy tales, but a set of species is quite di昀erent (various owls, crow, tit). In contrast to animal stories, realism is de昀椀ned by its focus on farm animals (horse being the most “realistic”) and quite numerous species of edible 昀椀sh. Perhaps not surprisingly, interest in snakes is characteristic of teen horror 昀椀ction. Science 昀椀ction is much more focused on sea animals along with extinct species (o昀琀en dinosaurs). Pets (primarily cats and dogs) take a very prominent place in detectives. All animals not native for northern temperate zone are grouped under a label ‘exotic’. For instance, lion (‘king of animals’) and tiger are typical of fairy tales.

5. Discussion

The models introduced in this short paper aim to ground computational analysis of literary content (or computational thematics, as suggested by Sobchuk and Šeļa in15[]) in the categories relevant for literary production. While the models are simplistic they operate on a level of a whole literary work which allows to relate them to the aspects of the creative process. The animal diversity model was able to detect genre-associated di昀erences in the accessibility of animal vocabularies while controlling for the author and text length. This result corroborates the supposed e昀ect of literary tradition for this particular data. The mixture model that distinguishes between rich and scarce animal vocabularies also proved to be a useful tool to locate diversity-generating tropes, as shown in the Platonov example in sectio4.n

I see an important advantage of the vocabulary growth models employed here in their ability to estimate parameters even for the very short texts on a par with longer ones. This feature is speci昀椀cally relevant for any diversity measurements made on a lexical basis, say, biodiversity as represented in literature. The reason is that text length is the strongest predictor of the vocabulary size regardless of other factor1s6][. For this reason previous work on literary biodiversity 6[ , 10 ] had to recourse to the minimum text length threshold which is equivalent to implicitly stratifying by text length without a proper theoretical justi昀椀cation.

In comparison to linear prevalence models (Poisson regression) vocabulary growth models may o昀er more 昀椀ne-grained estimates. For instance, given a point estimate of the author’s propensity to mention animals, both the Heaps’-based model suggested in this article and a similarly structured Poisson model (see appendBifxor details) produce rather similar posterior inferences. But in case of a Poisson model, the parameter estimate uncertainty grows quickly with the text length, unlike the Heaps’ model (昀椀g.3).

An important limitation of the vocabulary growth models based on the Heaps’ formula in comparison to linear models is that various predictors cannot be so easily incorporated into the model. While coe昀케cients and provide two points to stratify by predictor variables, their e昀ects on the outcome are not symmetric. Moreover, adding more than two predictors (for instance, as factors o f ) runs into an identi昀椀cation problem in the context of Bayesian inference. Reparameterizing the model or switching to another basic vocabulary growth model may be required to tackle this problem.

The genre-topic model that captures preferences for speci昀椀c animal lemmas conceptually describes the animal pro昀椀le of the genre as a deviation from the corpus-wide distribution of animal frequencies. This is structurally analogous to a popular idea in stylometry that is behind the Burrows’ Delta2[] and was shown to work for the genre classi昀椀cation as well1[ 4, 15 ]. The advantage of the Bayesian genre-topic model in comparison to simpler measures of lexical distinctiveness is that it provides estimates for the uncertainty of the parameters. Highly uncertain estimates for the proportion of the genre-related topic in a document indicates that with this particular model and data it is not possible to tell to what extent the animal vocabulary of a given author is de昀椀ned by the literary tradition or by the external factors. Nevertheless, the model was able to detect relatively minor genre-related modulations in frequency of certain animal species in the presence of a strong signal of a general linguistic or literary background. ing the Genres of Literary Fiction”. Ina:rXiv preprint arXiv:2305.11251 (2023). [16] F. J. Tweedie and R. H. Baayen. “How Variable May a Constant be? Measures of Lexical

Richness in Perspective”. In:Computers and the Humanities 32 (1998), pp. 323–352. [17] T. Underwood. “The Life Spans of Genres”. InD:istant horizons: digital evidence and lit[19] B. Yarkho. “Metodologiya Tochnogo Literaturovedeniya. Izbrannye Trudy po Teorii Literatury [A Methodology for a Precise Science of Literature. Selected Works on Literary Theory]”. In: Moscow: Languages of Slavic Cultures, 2006, pp. 247–251.

A. The definition of vocabulary growth models

the help of the Heaps’ formul a =

The central idea is to de昀椀ne the expected size of the animal vocabulary in a given text with . To adapt to the fact that vocabulary size is a natuvariation: either coe昀케cient or could vary by genre or by author. ral number, the expected value predicted by the Heaps’ model can be treated as a parameter (expected value) for a Poisson distribution. To test the hypothesis that there is an e昀ect of literary tradition on accessibility of the animal vocabulary associated with sub-genre of children’s literature, one needs to stratify animal vocabulary growth rates by genre. Alternatively, inter-genre variance in animal vocabularies may be explained away by external factors (individual author characteristics). Heaps’ formula o昀ers two options to account for genre/author

To select the 昀椀nal model I tested all logically possible combinations oafnd coe昀케cients associated with either genre or author. The best performing model was selected by evaluating the model’s predictive ability for the animal vocabulary data in children’s literature with the help of the WAIC criterion. The model comparison summary is presented in ta1b.leFor animal vocabulary data, coe昀케cient turns out to be more e昀ective in capturing data variance in comparison to . It works this way both for author-based variance and genre-based variance. Including author as a factor in a formula always results in a much better 昀椀t. Whenever author improvement is more pronounced if genre is taken into account as a more e昀ecti vecoe昀케cient. is present, adding genre results in a relatively small (but non-null) model improvement. This The formal de昀椀nition of the selected model follows below 3in,the rest of the models had similar structure and priors. All models employed partial pooling on author/genre coe昀케cients. where is animal vocabulary size , stands for the expected Poisson rate for a tex t, and is the length of a text in thousands of tokens. External factors that in昀氀uence accessibility of animal category are captured by that re昀氀ects interest in animals for each individual author. Internal (literary) factors are captured by the expon e ntthat varies by genre. The distributions of both author and genre coe昀케cients are de昀椀ned by higher-order priors, and , . ̄ ̄

The mixture model represents vocabulary size as a result of either a low-intensity background process, or a high-intensity foreground process, mixed in a genre-speci昀椀c proportion . The expected vocabulary size value for both processes is de昀椀ned by the same Heaps’ formula. The formal de昀椀nition of the model is given in4.

∼ Poisson( 1) + (1 − )Poisson( 2) 1 = 1 2 = 2 ∼ Log-Normal(1, 0.7) ∼ Beta(5, 5) logit( ) = where 1 and 2 stand for the expected value for animal vocabulary for the background and (3) (4) the foreground processes, respectively. Simila rl1ya,nd 2 denote accessibility coe昀케cients for both processes.

B. Poisson prevalence model

To provide a comparison with the suggested vocabulary growth model, a more traditional Poisson generalized linear model for lexical prevalence was de昀椀ned and applied to the same data. The model is designed to maximally 昀椀t the structure of the Heaps-based vocabulary growth model. Partial pooling of the author and genre coe昀케cients is employed as well. To optimize inference, the non-centered model parameterization was used. Logarithm of text length is taken into account as an exposure parameter. The formal model de昀椀nition is as follows: log( ) = +̄ + + log

C. Corpus details

Texts in the Detcorpus data come with a list of genre tags assigned based on the bibliographic and contextual data. For the present analysis lists of genre tags were run through a simpli昀椀cation procedure to arrive at a single level for each text. In case there are several genre tags for a text, only one of them is retained. Some secondary sub-genre tags are omitted as a result, for instance, “school novel”. If a list of genres contain a fairy tale tag, the text was always regarded as a fairy tale. If there are several genre tags, the 昀椀rst one in a list is regarded as primary and retained. Several genres with a very sparse representation in the corpus were omitted from the analysis (adventure, biography).

The data contains texts written by 917 authors, with the majority of them (89%) represented in a single genre only. See tabl2e for details on author and genre distribution.

As a result, one can see that the composition of the dataset in terms of genres is not in any way a balanced sample. The genre preferences changed with time, and the corpus sample is also somewhat imbalanced diachronically, with some decades represented better than others. The distribution of genres by decade are shown on 昀椀g.4. Some genres are represented better than others, with realism (a “default” genre assigned to texts without a speci昀椀c genre a昀케liation) spanning 53% of works included in the corpus. genre realism skazka detective scifi animalistic love horror fantasy 1588 450 349 205 164 53.0 15.0 11.7 6.8 5.5 3.8 2.8 1.4 ngenres 1 2 3 4 5 6 813 85 15 2 1 1 88.7 9.3 1.6 0.2 0.1

D. M. Blei , A. Y.

Ng , and M. I. Jordan. “Latent Dirichlet Allocation”. JIonu:rnal of machine Learning research 3 . Jan ( 2003 ), pp. 993 - 1022 .

[2]

Burrows . “'Delta' : a Measure of Stylistic Di昀erence and a Guide to Likely Authorship” . In: Literary and linguistic computing 17.3 ( 2002 ), pp. 267 - 287 .

[3]

Calvo TelloT.he Novel in the Spanish Silver Age . Wetzlar: Bielefeld University Press, 2021 .

[4]

Cosslett . Talking animals in British children's 昀椀ction , 1786 - 1914 . New York: Routledge, 2017 .

[5]

Heuser and

Le-Khac . A Quantitative Literary History of 2,958 Nineteenth-Century British Novels: The Semantic Cohort Method . Pamphlets of the Stanford Literary Lab 4 . 2012 . url: http://litlab.stanford.edu/LiteraryLabPamphlet4. p.df

[6]

Langer ,

Burghardt ,

Borgards ,

Böhning-Gaese ,

Seppelt , and

Wirth . “ The Rise and Fall of Biodiversity in Literature: A Comprehensive Quanti昀椀cation of Historical Changes in the Use of Vernacular Labels for Biological Taxa in Western Creative Literature” . In:People and Nature 3 .5 ( 2021 ), pp. 1093 - 1109 .

D. C. van Leijenhorst and T. P. Van der Weide. “ A Formal Derivation of Heaps' Law” . In: Information Sciences 170 . 2 - 4 ( 2005 ), pp. 263 - 272 . doi: 10 .1016/j.ins. 2004 . 03 .006.

[8]

Maslinsky . Replication Data for: How Exactly does Literary Content Depend on Genre? A Case Study of Animals in Children's Literature. Repository of Open Data on Russian Literature and Folklore . Version V1 . 2023 . do1i0:.31860/openlit-2023. 10 - R005 .

[9]

Maslinsky ,

Lekarevich , and L. AleinikC. orpus of Russian Prose for Children and Young Adults. Repository of Open Data on Russian Literature and Folklore . Version V2 . 2021 . doi: 10 .31860/openlit-2021. 4 - C001 .

[10]

Piper . “ Biodiversity is not Declining in Fiction” . IJno:urnal of Cultural Analytics 7.3 ( 2022 ). doi: 10 .22148/001c. 38739 .

[11]

Piper . “Fictionality”. InJ:ournal of Cultural Analytics 2.2 ( 2016 ). doi: 10 .22148/16.011.

[12]

Piper and

Bagga . “A Quantitative Study of Fictional Things” . InP:roceedings of the Computational Humanities Research Conference. Antwerp, Belgium , 2022 , pp. 268 - 279 . url: https://ceur-ws. org/ Vol- 3290 /long%5C% 5Fpaper1576 .p. df [13]

Ritvo . “Learning from Animals: Natural History for Children in the Eighteenth and Nineteenth Centuries” . In:Children's literature 13.1 ( 1985 ), pp. 72 - 93 .

[14]

Sharmaa ,

Hu ,

Wu ,

Shang ,

Singhal , and

Underwood . “ The rise and fall of genre di昀erentiation in English-language 昀椀ction” . In: DH2020 (ADHO) Proceedings. Amsterdam , 2020 , pp. 97 - 114 .

erary change . Chicago: University of Chicago Press, 2019 . Chap. 2 , pp. 34 - 67 .

[18]

Underwood ,

Bamman , and

Lee . “ The Transformation of Gender in English0.1