<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Paris, France
∗Corresponding author.
£ jean.barre@ens.psl.eu(J. Barré);thierry.poibeau@ens.psl.eu(T. Poibeau)
ç https://crazyjeannot.github.io/(J. Barré);https://www.lattice.cnrs.fr/en/members/direction/thierry-poibeau/
(T. Poibeau)
ȉ</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Beyond Canonicity: Modeling Canon/Archive Literary Change in French Fiction</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jean Barré</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>ThierryPoibeau</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>École normale supérieure - Université PSL</institution>
          ,
          <addr-line>45 rue d'Ulm, Paris, 75005</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Lattice (Langues</institution>
          ,
          <addr-line>Textes, Traitements informatiques, Cognition), 1 rue Maurice Arnoux, Montrouge, 92049</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>This study o昀ers a fresh perspective on the Canon/Archive problem in literature through computational analysis. Following Tynianov's understanding of literature, we adopt a dynamic approach to literature by proposing a model of literary variability using the Kullback-Leibler divergence. We retrieve key authors and works that shape the broad outlines of literary change. Our aim is to evaluate the importance of canonical authors on literary variability. We opt for a cohort-driven setup to analyze the variability contributed by a given text, focusing on speci昀椀c formal and semantic aspects of texts such as topics, lexicon, characterization, and chronotope. The 昀椀ndings reveal that canonical authors tend to contribute slightly more to literary change than those from the archive.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;literary history</kwd>
        <kwd>computational literary studies</kwd>
        <kwd>distant reading</kwd>
        <kwd>literary variability</kwd>
        <kwd>canon/archive</kwd>
        <kwd>cohort-driven model</kwd>
        <kwd>cultural analytics</kwd>
        <kwd>natural language processing</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The Canon/Archive problem is a well-known issue in the 昀椀eld of the Computational Literary
Studies (CLS). It has been and continues to be a fundamental aspect of the CLS 昀椀eld, as
computational methods allow researchers to expand their investigations beyond the limited study of
the Canon and its restricted number of texts. With the ability to process vast amounts of
digitized texts in a matter of hours, researchers can now engage in distant reading, as proposed by
Moretti [
        <xref ref-type="bibr" rid="ref24">23</xref>
        ], and conduct experiments on the textual content of literary works. This approach
enables scholars to zoom in and out from the literary past, leading to a better understanding
of general trends describing literary evolution.
      </p>
      <p>This introduction of new perspectives and alternative modes of inquiry raises a fundamental
question: “Do we understand the outlines of literary history ?”. Underwo3o1d] e[loquently
poses this question, contemplating whether the texts preserved thus far adequately represent
the entire spectrum of literary production, or if the discipline of literary has been constrained
by narrow perspectives throughout its existence.</p>
      <p>
        This line of investigation is not entirely new, as Iouri Tynianov expressed similar concerns
in 1927 when he stated that “The theory of value in literary scholarship fueled the temptation
to study major (but also isolated) phenomena and has turned literary history into a ”history of
generals””[
        <xref ref-type="bibr" rid="ref32">30</xref>
        ]. However, Tynianov o昀ered a way out by suggesting that the value of a given
literary phenomenon should be understood in terms of its “signi昀椀cance and evolutionary
qualities”. According to Tynianov3[0], literary recognition is a dynamic process, and analyzing it
requires studying “literary variability”. This refers to the diversity and range of formal elements
present in literary works. It encompasses the di昀erent ways authors employ language, style,
themes, narrative structures, characterization, settings, and other literary elements to create
unique and distinct pieces of literature. This perspective sees literary history not as a linear
chronology but considers literature within a dynamic and indivisible process that is constantly
evolving. Every written text available in a library has the potential to in昀氀uence the process
of writing. Accounting for this perpetual movement necessarily requires an understanding of
how each new text seeks to formally distinguish itself from its predecessors while still being
shaped, for example, by a speci昀椀c, conscious or unconscious, generic intertextuality.
      </p>
      <p>
        This study aims at evaluating to what extent canonical works are reliable witnesses in
terms of literary variability. Previous research uncovered disparities in the textual content
between what is considered canonical and non-canonical across various corpora and cultural
backgrounds ([
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], [
        <xref ref-type="bibr" rid="ref35">33</xref>
        ], [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]). These previous studies showed that canonical sets share to
some extent (at least for speci昀椀c timespans) an intrinsic norm. As outlined by Barré, Camps,
and Poibeau [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], this research identi昀椀es what Altieri [
        <xref ref-type="bibr" rid="ref2 ref36">2</xref>
        ] terms a “cultural grammar”,
suggesting that canonical literary works function as foundational texts shaping the norms, values, and
conventions within a speci昀椀c cultural tradition.
      </p>
      <p>
        Hence, a pertinent question emerges: Does this speci昀椀c norm comprehensively account for
literary variability, or is it missing something? On the one hand, canonical novels, renowned
for their signi昀椀cance and in昀氀uence in literary traditions, can indeed act as pivotal benchmarks
for both writers and readers28[]. Their impact on literary practices can manifest in various
ways: inspiring new writing styles, introducing innovative themes, or encouraging formal
experiments. Writers can be in昀氀uenced by these canonical novels, either seeking to di昀erentiate
themselves or to align with their in昀氀uence 2[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. On the other hand, the concept of
canonicity can be seen as biased and limited in capturing the evolution of formal practices. Indeed,
the canon acts as a framework shaping not only present literary creation but also in昀氀uencing
how we retrospectively perceive the past, aligning it with contemporary norms. Therefore, the
complex nature of the canonization process, in昀氀uenced by external factors such as the school
system and editorial policies13[], may hinder its ability to incorporataevant-garde literary
changes.
      </p>
      <p>In this paper, we introduce an operational model aiming to de昀椀ne and measure formal
variability in 19th and 20th century French novels. Our approach is 昀椀rmly rooted in established
canonical sets derived from prior research on contemporary reception in Fra9n]c)e. (O[ur main
objective is to explore whether the canonized works selected from contemporary reception
accurately mirror the broader spectrum of change within the overall French novelist production.
For this purpose, we try to identify the key works and key authors driving literary variability.
By analyzing formal aspects and considering literature as a dynamic system, we seek to gain
insights into the 昀氀ow of literary variability.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Materials and Methods</title>
      <sec id="sec-2-1">
        <title>2.1. Corpus</title>
        <p>This study is based on the corpus collected in the framework of the “ANR Chapitr1e,sa”corpus
of nearly 3000 French novels3[]. The goal of this project was to evaluate the pace of change
in the length of chapters over two centuries. The corpus is structured in XML-2TE(TIext
Encoding Initiative) encoding, to add metadata to the texts. The period concerned extends
over two centuries of novel production, from the 19th to the 20th century, as can be seen in
Figure1.</p>
        <p>Each text in the corpus is enriched with metadata, including subgenre tags and authors’
dates (birth and death). The latter are highly relevant for our work as we focus on the e昀ect of
cohorts on the pace of literary change.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Textual features</title>
        <p>The concept of literary variability encompasses a broad spectrum of possibilities, and it can
manifest itself in various ways within a text. By examining speci昀椀c elements such as themes,
characterization, vocabulary, and chronotope, we aim to understand how novels have evolved
across di昀erent time periods. However, some notions (such as ’the plot’) are hard to formalize
and are thus not included in this study.
1https://chapitres.hypotheses.org/
2TEI Consortium, eds. TEI P5: Guidelines for Electronic Text Encoding and Interchange. Version 1.0. TEI
Consortium. http://www.tei-c.org/Guidelines/P5/.</p>
        <p>
          We 昀椀rst implemented topic modeling methods to extract topics from the texts. The Python
library Bertopic1[
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] was used in a guided setting. This refers to a set of techniques that
in昀氀uence the topic modeling process by providing prede昀椀ned seed topics for the model to converge
towards. These techniques enable users to specify a predetermined number of topic
representations that are guaranteed to appear in the results. We constructed a list of 50 topics we found
relevant for our study and retrieved their proportion within each no3vel.
        </p>
        <p>We also implemented a Bag-of-ngrams approach to retrieve the lexicon dimension of
literary change. To do so, we rely on the 1000 most frequent lemmas and 1000 most frequent
bigrams of lemmas. This may echo the paper by Cranenburgh and Koole1n5][, which showed
that using only unigrams and bigrams was su昀케cient to classify literary texts in terms of their
literary quality. We did not remove stopwords, since they may re昀氀ect an unconscious and
automatic structural way of writing27[], rather than less frequent words related to the content
and themes of the text. Our hypothesis is that the structural way of writing novel changes over
time and cohorts. Bag-of-words techniques work well for various experiments in the CLS 昀椀eld
(stylometry and author attribution for exampl1e8][), but they are quite controversial from a
literary point of view, since they exclude a great deal of information, including word order and
syntax. They are also limited in that they do not take into account the semantic dri昀琀 of words
over time. For instance the word “wild” does not refer to the same meaning when used in an
adventure novel from the late 19th century or in a climate 昀椀ction from the late 20th century.
Bearing this in mind, we assumed that bag-of-features still capture some dimensions of literary
change, particularly as regards the very frequent structural elements.</p>
        <p>
          One of the aims of this study was to capturechronotope information, which is a term coined
by the Russian literary scholar Bakhtin4][. In substance, the concept of chronotope explores
how the relationship between time and space in昀氀uences the portrayal of characters, the
development of plotlines, and the themes conveyed within a literary text. In the Natural Language
Processing (NLP) context, Kohlmeyer, Repke, and Krestel19[] demonstrated the limitations of
traditional document embeddings (optimized for shorter texts) in capturing complex facets in
novels (such as time, place, atmosphere, style, and plot). To address this problem, they propose
to use multiple embeddings re昀氀ecting di昀erent facets, splitting the text semantically rather
than sequentially. Inspired by these 昀椀ndings, we adapted their methodology. By using an NLP
pipeline speci昀椀cally tuned for novels, (fr-BookNLP, part of the multilingual BookNLP project
[
          <xref ref-type="bibr" rid="ref5 ref7">7, 5</xref>
          ]), we extracted literary entities representing thcheronotope, speci昀椀cally focusing on FAC,
TIME, LOC, and VEH.4 The presence of chronotope elements in a novel is highly in昀氀uenced
by its subgenre categorization. We believe that this type of information is crucial for our task
as it has the potential to capture signi昀椀cant aspects of literary variability. To obtain vector
representations of thechronotope elements in novels, we trained a Paragraph Vectors model
[20] (Doc2Vec) using a subset of our novel dataset. We then generated four vector embeddings
from our four spans of entities. Each facet has a vector with 300 dimensions, resulting in a
1200 dimensions vector that captures thechronotope information for each novel.
        </p>
        <p>We also considered that characterization was a signi昀椀cant element in our task, as we
be3For the topic modeling process, seaeppendix A.1
4Respectively Facilities, Time, Location, Vehicle - see6][ for more information on the NER labels, anadppendix A.2
for the evaluation of Fr-BookNLP
lieved that changes in literature could in昀氀uence how characters are portrayed to readers. We
thus focused on identifying key verbs that drive the actions of the main characters and the
adjectives used to describe them. In line with Woloch34[]’s concept of the character space
as “the encounter between an individual human personality and a determined space and
position within the narrative as a whole”, we used coreference resolution techniques, speci昀椀cally
those o昀ered by fr-BookNLP5, to automatically detect and analyze the distribution of
character mentions throughout the narrative8][. We used the Spacy parser to extract the verbs and
adjectives associated with each character mention. By analyzing the syntactic structure of the
text, we identi昀椀ed the verbs that represented the actions of the characters and the adjectives
that characterized them. For each novel, we selected the top 昀椀ve main characters and generated
two vector embeddings of 300 dimensions: one representing the adjectives associated with the
characters and the other representing the verbs. These embeddings capture the semantic
information related to the characters’ traits and actions, providing a compact representation of
their characteristics within the narrative.</p>
        <p>By incorporating these various aspects, each novel can be represented as a concatenated
multidimensional vector with 3850 dimensions, 50 for the topics, 2000 for the bag-of-ngrams,
1200 for chronotope elements and 600 for the characterization. Therefore, our vector
representation provides a comprehensive formalization of the novel, enabling further analysis and
comparisons.</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Measuring Literary Variability</title>
        <p>
          Basing our work on the aspects presented above (topics, bag-of-ngrams, chronotope, and
characterization), we had to 昀椀nd a way to grasp literary variability. We decided to implement a
commonly used measure, the Kullback-Leibler divergence (KLD). It is a type of statistical
metric that makes it possible to quantify the dissimilarity between two probability distributions:
the target distribution P and a reference distribution Q. Within this framework, we assessed
the variability of a text by measuring the surprise or deviation of that text from a set of other
texts. Speci昀椀cally, the KLD from Q to P is de昀椀ned as follows:
 (||) =

∑ () log( ()
()
)
where P represents our formal features for a text, normalized as a probability distribution, and
Q stands for the average of all the texts we wish to compare P with. This measure, derived from
information theory, 昀椀nds application in various 昀椀elds, including assessing sample diversity in
ecology and examining elements of linguistic evolutio1n2][. Barron, Huang, Spang, and DeDeo
[
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] applied KLD to a corpus of debates in the French revolution’s 昀椀rst parliament, assessing
both the novelty of a particular speech compared to prior speeches and its transience compared
to future ones.
        </p>
        <p>In the realm of literature, Algee-Hewitt, Allison, Gemma, Heuser, Walser, and Mor1e]tti [
proposed this method to determine the informational content of texts by evaluating the
predictability of word-to-word transitions, taking into account the range of possible transitions.
5A discussion on the evaluation of fr-BookNLP and its limitations is provided inatphpeendix A.2.
Liddle [21] also discussed the possibility that mathematical information theory may be relevant
to literary analysis by showing statistically signi昀椀cant correlations between national histories
of the novel and information-theoretical pressures.</p>
        <p>
          Previous research showed that literary change throughout an author’s life was powerful
enough to predict the publication date of a given text29[]. However, other studies
demonstrated that literary change brought about by an author throughout their life remains limited
compared to the cohort e昀ect [
          <xref ref-type="bibr" rid="ref34">32</xref>
          ]. In other words, literary change appears to be driven by
cohort renewal, which is indeed relevant since events that shape an author’s life, and that
are likely to have an impact on their writing style, also in昀氀uence all authors within the same
generation.
        </p>
        <p>Measuring the variability of a text in relation to a set of other texts immediately places us in a
dual con昀椀guration: we can measure the variability of a given text with the works that precede
it and with those that follow it. Studying the circulation, selection, and propagation of literary
patterns in a group of texts, we can understand the dynamics of literary change and the extent
to which an author’s language patterns in昀氀uence and are adopted by others. From a literary
perspective, measuring change between two successive texts may not make much sense, as
numerous factors can come into play, such as a昀케liation with a particular subgenre, a speci昀椀c
period, a literary school, or even the author themselves. Therefore, a speci昀椀c framework is
necessary to conduct our experiments, which revolves around the notion of generation, assuming
that texts produced within the same generation share certain characteristics and in昀氀uences.</p>
        <p>
          Béhard de昀椀nes a generation as a concept that aims to “understand the succession of aesthetic
productions based on a community of upbringing, interests, and ideas speci昀椀c to the same
age group of writers, following a periodicity of approximately 30 years linked to historical
and political cycles”1[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. This generation-based approach allows us to examine the changes
and innovations introduced by authors within their respective cohorts, while also considering
the broader historical and literary context in which these works emerge. It provides a more
nuanced understanding of how literary variability is shaped and in昀氀uenced by various factors,
contributing to a richer analysis of the dynamic nature of literature.
        </p>
        <p>To further support Béhar’s argument, Morett2i4[] also evaluated the regularity of the
replacement of literary subgenres that he examines. He suggested that “a sort of generational
mechanism seems to be the best way to account for the regularity of the cycle of novelistic
production”. Moretti’s analysis focused on the cycle of change, considering a timeframe of 25
to 30 years. These studies are complemented by Underwood, Kiley, Shang, and Vaise3y2][
seminal research, who showed the signi昀椀cance of cohorts on literary change. Their 昀椀ndings
indicated that cohorts have such a substantial impact that they account for more than half of
the amount of change in literature.</p>
        <p>Building upon their conclusions, we computed KLD comparing successive cohorts in a
timespan of 30 years. For instance, if analyzing a novel published in 1970 by an author born in 1930,
we compare it with all the books written by authors born between 1870 to 1900 to evaluate
the extent of change. This approach enables us to consider cohorts as a rolling phenomenon,
since de昀椀ning arbitrary cohorts would not be representative of the phenomenon of cohort
succession. This methodology allows us to view literature as a continually evolving synchronic
system, framed by cohorts.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Results</title>
      <sec id="sec-3-1">
        <title>3.1. Literary dynamics: between novelty and influence</title>
        <p>Figure2 represents the amount of variability for each text: The x-axis represents the entropy
of each text, relative to the cohort preceding the text’s author, indicating the level of surprise
or formal novelty in the text compared to its preceding cohort. A text with a high ”surprise”
score on this axis would be considered formally innovative. The y-axis represents the entropy
of each text, relative to the cohort following the text’s author, indicating the level of surprise or
in昀氀uence that the text has on the next cohort. A text with high surprise on this axis indicates
that it has little in昀氀uence on what follows. A text that is both highly in昀氀uential and innovative
would receive a high value on the x-axis and a low value on the y-axis.</p>
        <p>As the graph is complex, three novels are depicted in order to make it easier to understand
how the graph works. Gustave Flaubert’Msadame Bovary stands out with high novelty and
high in昀氀uence scores, thanks to its groundbreaking narrative style, character development,
and enduring impact on literature. Jean GionoR’segain receives a high novelty score for its
exploration of resilience and human connection to nature, but its low in昀氀uence suggests that it
did not gain immediate widespread recognition. Émile SouvestrUe’ns Philosophe sous les Toits
addresses social issues and garnered signi昀椀cant in昀氀uence despite not being part of the French
literary canon, making it historically signi昀椀cant. Canonical texts are highlighted in orange,
following the canonical sets at the author scale from previous wo9r]k. I[t is notable that most
of these texts are positioned below the x=y line. This suggests that canonical works exhibit
greater variability compared to the preceding cohort but slightly less variability compared to
the following one. This may indicates that canonical works have a higher level of innovation
and in昀氀uence.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Signal of literary variability</title>
        <p>These initial 昀椀ndings were highly intriguing, leading us to pursue a complementary approach
to gain a deeper understanding. We focused on the x-axis, which we deemed more
comprehensible from a literary standpoint. The change introduced by a text (or an author) in relation
to the previous generation is intuitively grasped as each text 昀椀nds a way to di昀erentiate itself
from the broader literary production of a given period. By associating the entropy value
obtained for each text with its author’s birth date, we were able to represent in 昀椀gur3ethe signal
of change over time.</p>
        <p>Through visual representation, we showcased the patterns and 昀氀uctuations observed in the
analysis of KL divergence or entropy. This approach allowed us to capture the dynamic nature
of literary change and observe how it manifests itself over di昀erent historical periods. The
resulting graph provides a visual narrative of the evolving literary landscape, shedding light
on the formal shi昀琀s in the realm of literature. It o昀ers a compelling visual representation of the
signal of literary change, enabling a more nuanced understanding of the complex processes at
play in cultural and artistic evolution.</p>
        <p>Each peak corresponds to a signi昀椀cant variability introduced by a speci昀椀c author (or a group
of authors sharing the same birth date). This approach allows us to identify the names of key
works and key authors that drive literary variability. It should be mentioned that the last peak
should be ignored due to the lack of authors born around 1910 and later in our corpus.</p>
        <p>Jules Verne, born in 1828, is the author who mainly explains the second peak in variability.
His works, particularlVyingt-mille lieues sous les mers, published in 1870 (with 0.149 KLD), and
L’Île Mystérieuse, published in 1875 (with 0.232 KLD) exemplify his innovative approach to
literature. In the former one, Verne introduced readers to Captain Nemo’s underwater vessel, the
Nautilus, which travels beneath the seas and explores uncharted depths. This visionary
depiction of a futuristic submarine, powered by electricity and equipped with advanced technology,
set the stage for the emergence of the science 昀椀ction genre. In L’Île Mystérieuse, Verne
combined elements of adventure and survival on a remote island with the exploration of technology
and engineering. The novel tells the story of a group of castaways who use their knowledge
and resourcefulness to survive and thrive on the island.</p>
        <p>Verne’s innovative storytelling can be seen as a response to the social fascination with
progress and exploration. During the late 19th century, the world was witnessing rapid
advancements in science and technology, driven by the Industrial Revolution and scienti昀椀c
discoveries. This era of progress and innovation deeply in昀氀uenced the literary landscape, as
writers such as Verne sought to capture the spirit of exploration and curiosity prevalent in society.
This way, Verne laid the foundation for a mixture of adventure and science 昀椀ction subgenres.</p>
        <p>The peak from 1877 is led by Raymond Roussel, a lesser-known but highly innovative writer,
particularly with his workI mspressions d’Afrique, published in 1910 (with 0.145 KLD) andLocus
Solus, published in 1914 (with 0.21 KLD). The former one is a novel that de昀椀es traditional
narrative conventions and follows a dreamlike, non-linear structure. The story revolves around a
group of travelers who embark on a journey through Africa, encountering strange and
surrealistic occurrences along the way. The second narrative is also characterized by its intricacy
and complexity, as it contains multiple layers of storytelling that takes the reader on a tour of
the estate of a scientist named Martial Canterel, where he showcases a series of bizarre and
macabre inventions. Roussel’s experimental and imaginative storytelling style set him apart
as a pioneer in avant-garde literature, and his works can be seen as early surrealism.</p>
        <p>René Crevel and Nathalie Sarraute lead the peak in 1900. René Crevel’s woLrekroman
cassé, published in 1935 (with 0.17 KLD) and Nathalie SarrauteT’sropismes, published in 1939
(with 0.159 KLD) both exemplify their innovative approaches to literature, as they challenged
conventional narrative structures and scrutinized the inner workings of human consciousness.</p>
        <p>Crevel’s novel breaks away from traditional linear storytelling and embraces a fragmented
and non-linear narrative style. The author’s exploration of the subconscious mind and his use
of stream-of-consciousness writing make the novel a precursor to the surrealist and modernist
movements. The novel centers around the mental states of its characters, delving into their
thoughts, dreams, and desires. The titlLee roman cassé itself, which translates toThe Broken
Novel, is indicative of Crevel’s intention to dismantle traditional narrative conventions and
explore new modes of expression. His work can be seen as an early example of the deconstruction
of the novel form, where the focus shi昀琀s from external events and plot-driven storytelling to
an exploration of the characters’ inner lives and psychological states.</p>
        <p>Nathalie Sarraute’sTropismes is a collection of interconnected short prose pieces that explore
the subtle and 昀氀eeting movements of the characters’ inner thoughts and feelings. Sarraute’s
writing style is characterized by its precision and attention to the nuances of human behavior.
She coined the term ”tropismes” to describe these brief and involuntary movements of the
characters’ consciousness. Her innovative use of language and her focus on the psychological
subtleties of her characters set her apart as a pioneer of tNheouveau Roman movement.</p>
        <p>Thus, our novelty signal highlights works and authors who have signi昀椀cantly distanced
themselves from the dominant formal rules of the previous generation. We identi昀椀ed authors
who have contributed to the creation of new sub-genres or avant-garde writers with varying
degrees of recognition, such as Raymond Roussel, an author from tAhrechive. These peaks might
represent pivotal moments in literary history where new ideas, styles, or narrative techniques
emerge, leading to a distinct shi昀琀 in the literary landscape. Nevertheless, it is worth noting
that any work that di昀ers formally from the majority of other novels stands out. For instance,
children’s novels such aLse petit prince (Antoine de Saint-Exupéry) also emerge prominently
in the scores.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Canonical novels, the drivers of literary variability?</title>
        <p>When examining the list of authors who stand out in their contribution to literary change,
many of them are well-known and directly associated with the literary canon. To assess the
amount of change among canonical works compared to non-canonical works from the archive,
we project in 昀椀gure 4 two distinct curves onto the graph based on their canonicity labels, at the
author level (considering all works by an author as canonical). By considering the canonicity
distinction, we gain a deeper understanding of how these di昀erent subsets of texts contribute
to the overall landscape of literary variability.</p>
        <p>Thus, we observe two distinct curves on the graph: the red curve representing canonical
authors and the blue one representing non-canonical authors. The margin of error for
canonical authors is larger due to their smaller number compared to the archival authors. The gap
between the two sets remains relatively stable over a century of authors’ birth dates. The clear
conclusion from the graph is that canonical authors tend to introduce more variability in their
novels compared to non-canonical authors.</p>
        <p>The smaller di昀erence observed towards the end of the period suggests a few possibilities.
It implies that the overall corpus becomes more limited towards the end, with a smaller pool
of texts available for analysis. Furthermore, it indicates that the criterion of canonicity might
be less relevant for the last generations of authors.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Limitations</title>
      <p>Our approach is subject to the inherent accuracy limitations of many NLP algorithms used,
including Fr-BookNL6P, Spacy, and Bertopic, all of which are prone to error.</p>
      <p>The choice of a 30-year time frame for cohort succession in our experiments is somewhat
subjective. Although it is reasonable to assume that signi昀椀cant changes occur within this window
compared to shorter intervals such as 5 or 10 years, the selection remains debatable.
Furthermore, our comparisons are limited to successive cohorts, neglecting the potential in昀氀uence of
earlier literary works. Authors are likely to have been in昀氀uenced by canonical texts published
decades or even centuries before their own works, which warrants further consideration.</p>
      <p>We faced the challenge of conducting close readings. While we have identi昀椀ed distinctive
authors and texts, we have not provided textual evidence of the observed changes. Given the
large-scale nature of our study, this is understandable, but future research should strive to
incorporate detailed textual analysis to support our 昀椀ndings. Future work will be dedicated
understanding which features contribute to what extent to the observed literary variability.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>In conclusion, this study has provided valuable insights into the dynamics of literary variability
and the role of canonical works in the French literary landscape. Through our
operationalization of formal variability and our cohort-driven model, we succeeded in identifying the names
of key works and key authors that drive literary variability. Analyzing formal aspects such as
topics, styles, chronotopes, and characterization in a large corpus of novels, the study aims to
uncover patterns of literary change and explore the relationships between texts, authors and
cohorts. By organizing texts into generations, we established a temporal and contextual
framework that allows us to capture and analyze the evolving literary dynamics over time. This
approach acknowledges that texts produced within the same generation share certain
characteristics and in昀氀uences, providing a meaningful basis for measuring and understanding literary
variability.</p>
      <p>We then investigated how accurately the canon re昀氀ects the overall degree of change in
literature. Surprisingly, our 昀椀ndings indicate that canonical authors contribute more variability
than non-canonical authors. At 昀椀rst glance, this might seem counter-intuitive, given that the
canonization process historically favors well-established, conventional works over avant-garde
and experimental ones. This tendency could lead one to expect that the canon might display
less variation in its literary characteristics. However, our results demonstrate that the canon
is far from a monolithic entity. It is not a rigid, uniform collection that uniformly represents
a particular literary style or period. This suggests that within the canon, there exists a
spectrum of works, spanning from those aligning with existing norms to those that truly challenge
boundaries and introduce novel literary elements. This implies that canonization is not an
entirely conservative process.</p>
      <p>One possible explanation could also be related to cultural and economic factors. Writers
whose works are part of the archive o昀琀en aim for a widespread readership and commercial
6see appendix A.2 for the evaluation of Fr-BookNLP
success, especially in subgenres associated with mass literature. In such subgenres, the
“horizon of expectations” [17] of the audience might induce the authors to adhere to certain expected
norms and styles. This emphasis on reaching a larger audience and meeting certain
expectations might impact the level of experimentation and deviation from established norms, leading
to a perceived lower mean variability in the archive compared to the canon.</p>
      <p>Further analyses are needed to conclude on this issue. We plan to delve into a more granular
examination of texts, shi昀琀ing scale towards close reading. The intention is to meticulously
analyze pivotal passages that signi昀椀cantly contribute to literary variability. The goal here is
to discern the relevance of various facets and assess their textual manifestation. To achieve
this, a more detailed exploration is planned, honing in on a speci昀椀c subgenre. This narrower
scope will facilitate a more intricate analysis and interpretation of the texts, enabling a deeper
understanding of their distinctive literary attributes.</p>
      <p>Moving forward, future research could also focus on investigating the role of speci昀椀c
subgenres in driving literary change. By examining whether the emergence or growth of certain
subgenres corresponds to peaks of change, we can gain a deeper understanding of how
di昀erent literary trends in昀氀uence the overall dynamics of literature.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>Jean Barré’s PhD is supported by the EUR (Ecole Universitaire de Recherche) Translitterae
(programme “Investissements d’avenir” ANR-10- IDEX-0001-02 PSL and ANR-17-EURE-0025). This
work was also funded in part by the French government under management of Agence
Nationale de la Recherche as part of the “Investissements d’avenir” program, reference
ANR-19P3IA0001 (PRAIRIE 3IA Institute). The authors also wish to thank the anonymous reviewers
whose comments have helped us to substantially improve this paper.</p>
    </sec>
    <sec id="sec-7">
      <title>A. Appendix</title>
      <sec id="sec-7-1">
        <title>A.1. Topic modeling: detailed approach</title>
        <p>
          We provided Bertopic 50 speci昀椀c topics with a list of 10 words associated with each topic. These
topics served as seed topics to guide the model’s convergence during the analysis. Bertopic is
an algorithm with several layers: for the embedding one we employed a CamemBERT base
sentence vectorizer 2[
          <xref ref-type="bibr" rid="ref23 ref6">6, 22</xref>
          ] to create embeddings for the sentences. These embeddings capture the
semantic meaning of the sentences and facilitate Bertopic analysis. Then we employed
Principal Component Analysis which allowed us to transform the high-dimensional embeddings
into a lower-dimensional space while preserving the essential information, making the
subsequent steps more e昀케cient. Then we clustered the sentences into distinct groups based on their
semantic similarities, with the HDBSCAN algorithm. It is a density-based clustering method
that identi昀椀es clusters of varying shapes and sizes, allowing us to group sentences that share
similar topics or themes. Throughout the analysis, the model was capable of retrieving more
than the initial 50 prede昀椀ned topics. However, in order to maintain consistency and focus on
the speci昀椀c topics we had prede昀椀ned, we chose to stick with the 50 seed topics provided by
Bertopic. This allowed us to have a more targeted and interpretable analysis, focusing on the
topics that were of particular interest for our research.
        </p>
      </sec>
      <sec id="sec-7-2">
        <title>A.2. Fr-BookNLP evaluation</title>
        <p>A.2.1. NER
NER evaluation of Fr-BookNLP on literary texts
precision</p>
        <p>recall
PER
LOC
FAC
TIME
VEH</p>
        <p>When evaluating the performance of the model, having better precision than recall implies
that when the model identi昀椀es literary entities, it is more likely to be accurate in its predictions.
Precision measures the percentage of correctly predicted literary entities out of all the predicted
entities. This is bene昀椀cial for the analysis as it ensures that the literary entities identi昀椀ed are
more likely to be correct, even though some relevant entities may be missed (lower recall). In
this context, prioritizing precision helps in reducing false positives and improving the reliability
of the identi昀椀ed literary entities. One important thing to note is that literary entities are not
exactly the same thing as NER in NLP. The speci昀椀cities of literary texts make the detection of
this kind of entity more complicated. Therefore the results obtained, even if they may seem
far from NLP standards, are state-of-the-art for the speci昀椀c processing of literary texts.
Coreference resolution evaluation of Fr-BookNLP on literary texts</p>
        <p>Metrics
88,0
69,2
71.8</p>
        <p>The issue of duplication arises when the model detects the same character multiple times
within the analyzed text. In some cases, the top 昀椀ve literary entities identi昀椀ed by the model
may contain instances where two or more main characters from a text are the same character
in terms of name or attributes. While this duplication might seem problematic at 昀椀rst glance,
it is essential to understand the context and purpose of the analysis. In this particular study,
the primary objective was not to identify unique and distinct characters but rather to retrieve
a proxy for characterization as a whole. We aimed to capture the prevalence and signi昀椀cance
of certain characters across di昀erent texts and literary works. Therefore, the focus is more on
character representation and the overall impact of these characters on the literary landscape,
rather than identifying completely separate and non-repeating characters.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Algee-Hewitt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Allison</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gemma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Heuser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Walser</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Moretti</surname>
          </string-name>
          . “Canon/Archive.
          <article-title>Large-scale Dynamics in the Literary Field”</article-title>
          .
          <source>IPna:mphlets of the Stanford Literary Lab. Pamphlets of the Stanford Literary Lab</source>
          <volume>11</volume>
          (
          <year>2016</year>
          ). urlh:ttps://litlab.st anford.
          <source>edu/LiteraryLabPamphlet11</source>
          .p.df
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>C.</given-names>
            <surname>Altieri</surname>
          </string-name>
          . “
          <article-title>An Idea and Ideal of a Literary Canon”</article-title>
          .
          <source>ICn:ritical Inquiry</source>
          <volume>1</volume>
          (
          <issue>Sept</issue>
          .
          <year>1983</year>
          ), pp.
          <fpage>37</fpage>
          -
          <lpage>60</lpage>
          . doi:
          <volume>10</volume>
          .1086/448236.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>ANRChapitres.Corpus</given-names>
            <surname>Chapitres</surname>
          </string-name>
          .
          <source>Version v1.0.0</source>
          .
          <year>2022</year>
          . doi:
          <volume>10</volume>
          .5281/zenodo.7446728.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Bakhtin</surname>
          </string-name>
          .
          <article-title>The dialogic imagination: four essays</article-title>
          .
          <source>Slavic series 1</source>
          . Austin, Tex: University of Texas Press,
          <year>2011</year>
          . 443 pp.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>D.</given-names>
            <surname>Bamman</surname>
          </string-name>
          .
          <source>BookNLP</source>
          .
          <year>2021</year>
          . url: https://github.com/booknlp/booknl.p
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>D.</given-names>
            <surname>Bamman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Popat</surname>
          </string-name>
          , and
          <string-name>
            <surname>S. Shen. “</surname>
          </string-name>
          <article-title>An annotated dataset of literary entities”. InP:roceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)</article-title>
          . Minneapolis, Minnesota: Association for Computational Linguistics,
          <year>2019</year>
          , pp.
          <fpage>2138</fpage>
          -
          <lpage>2144</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>N19</fpage>
          -1220.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>D.</given-names>
            <surname>Bamman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Underwood</surname>
          </string-name>
          , and
          <string-name>
            <given-names>N. A.</given-names>
            <surname>Smith. “A Bayesian Mixed</surname>
          </string-name>
          <article-title>E昀ects Model of Literary Character”</article-title>
          .
          <source>In:Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. Acl</source>
          <year>2014</year>
          . Baltimore, Maryland: Association for Computational Linguistics,
          <year>2014</year>
          , pp.
          <fpage>370</fpage>
          -
          <lpage>379</lpage>
          . doi:
          <volume>10</volume>
          .3115/v1/
          <fpage>P14</fpage>
          -1035.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J.</given-names>
            <surname>Barré</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Cabrera Ramírez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Mélanie</surname>
          </string-name>
          ,
          <string-name>
            <surname>and I. Galleron.</surname>
          </string-name>
          “
          <article-title>Pour une détection automatique de l'espace textuel des personnages romanesques”</article-title>
          .
          <source>InH:</source>
          umanistica
          <year>2023</year>
          .
          <article-title>Corpus. Association francophone des humanités numériques</article-title>
          . Genève, Switzerland,
          <year>2023</year>
          , pp.
          <fpage>56</fpage>
          -
          <lpage>61</lpage>
          . url: https://hal.science/hal-04105537.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.</given-names>
            <surname>Barré</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.-B. Camps</surname>
            , and
            <given-names>T. Poibeau. “Operationalizing</given-names>
          </string-name>
          <string-name>
            <surname>Canonicity</surname>
          </string-name>
          .
          <article-title>A Quantitative Study of French 19th and 20th Century Literature”</article-title>
          .
          <source>InJ:ournal of Cultural Analytics</source>
          (
          <year>2023</year>
          ). doi:
          <volume>10</volume>
          .22148/001c.
          <fpage>88113</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A. T. J.</given-names>
            <surname>Barron</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. L.</given-names>
            <surname>Spang</surname>
          </string-name>
          , and
          <string-name>
            <surname>S. DeDeo.</surname>
          </string-name>
          “
          <article-title>Individuals, institutions, and innovation in the debates of the French Revolution”</article-title>
          .
          <source>InP:roceedings of the National Academy of Sciences 115.18</source>
          (
          <year>2018</year>
          ), pp.
          <fpage>4607</fpage>
          -
          <lpage>4612</lpage>
          . doi:
          <volume>10</volume>
          .1073/pnas.1717729115.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>H.</given-names>
            <surname>Béhar</surname>
          </string-name>
          .
          <article-title>La littérature et son golem</article-title>
          . Vol.
          <volume>1</volume>
          : Travaux de linguistique quantitative 58. Paris: H.
          <string-name>
            <surname>Champion</surname>
          </string-name>
          ,
          <year>1996</year>
          . 2 pp.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>C.</given-names>
            <surname>Bentz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Alikaniotis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cysouw</surname>
          </string-name>
          , and
          <string-name>
            <surname>R.</surname>
          </string-name>
          <article-title>Ferrer-i-Cancho. “The Entropy of WordsLearnability and Expressivity across More than 1000 Languages”</article-title>
          .
          <source>IEnn:tropy 19.6</source>
          (
          <issue>2017</issue>
          ), p.
          <fpage>275</fpage>
          . doi:
          <volume>10</volume>
          .3390/e19060275.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>P.</given-names>
            <surname>Bourdieu</surname>
          </string-name>
          .Les règles de l'
          <article-title>art. genèse et structure du champ littéraire</article-title>
          . Paris: Éditions du Seuil,
          <year>1992</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>J.</given-names>
            <surname>Brottrager</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Stahl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Arslan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            <surname>Brandes</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Weitin</surname>
          </string-name>
          . “
          <article-title>Modeling and Predicting Literary Reception”</article-title>
          .
          <source>In:Journal of Computational Literary Studies</source>
          (
          <year>2022</year>
          ). doi:
          <volume>10</volume>
          .48694/j cls.
          <volume>95</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>A. van Cranenburgh</surname>
            and
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Koolen</surname>
          </string-name>
          . “
          <article-title>Identifying Literary Texts with Bigrams”</article-title>
          .
          <source>IPnr:oceedings of the Fourth Workshop on Computational Linguistics for Literature. Proceedings of the Fourth Workshop on Computational Linguistics for Literature</source>
          . Denver, Colorado, USA: Association for Computational Linguistics,
          <year>2015</year>
          , pp.
          <fpage>58</fpage>
          -
          <lpage>67</lpage>
          .
          <year>do1i</year>
          :
          <fpage>0</fpage>
          .3115/v1/
          <fpage>W15</fpage>
          - 0707.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <year>2022</year>
          . arXiv:
          <volume>2203</volume>
          .
          <fpage>05794</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          of Minnesota Press,
          <year>1982</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <given-names>M.</given-names>
            <surname>Kestemont</surname>
          </string-name>
          . “
          <article-title>Function Words in Authorship Attribution</article-title>
          . From Black Magic to Theory?”
          <source>In: Proceedings of the 3rd Workshop on Computational Linguistics for Literature (CLFL)</source>
          .
          <source>Proceedings of the 3rd Workshop on Computational Linguistics for Literature (CLFL)</source>
          . Gothenburg, Sweden: Association for Computational Linguistics,
          <year>2014</year>
          , pp.
          <fpage>59</fpage>
          -
          <lpage>66</lpage>
          . doi:
          <volume>10</volume>
          .3115/v1/
          <fpage>W14</fpage>
          -0908.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>L.</given-names>
            <surname>Kohlmeyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Repke</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Krestel</surname>
          </string-name>
          . “
          <article-title>Novel Views on Novels: Embedding Multiple Facets of Long Texts”</article-title>
          . In: 2021 Association for Computing Machinery. (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <given-names>Q. V.</given-names>
            <surname>Le</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.</given-names>
            <surname>Mikolov</surname>
          </string-name>
          .
          <source>Distributed Representations of Sentences and Documents</source>
          .
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <source>arXiv:1405</source>
          .
          <fpage>4053</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <string-name>
            <given-names>D.</given-names>
            <surname>Liddle</surname>
          </string-name>
          . “
          <article-title>Could Fiction Have an Information History? Statistical Probability and the Rise of the Novel”</article-title>
          .
          <source>In:Journal of Cultural Analytics</source>
          (
          <year>2019</year>
          ). doi:
          <volume>10</volume>
          .22148/16.033.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>L.</given-names>
            <surname>Martin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Muller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. J. O.</given-names>
            <surname>Suárez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Dupont</surname>
          </string-name>
          , L. Romary, É. V.
          <string-name>
            <surname>de la Clergerie</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Seddah</surname>
            , and
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Sagot</surname>
          </string-name>
          . “
          <article-title>CamemBERT: a Tasty French Language Model”</article-title>
          .
          <article-title>InP: roceedings of the 58th Annual Meeting of the Association for Computational Linguistics (</article-title>
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>F.</given-names>
            <surname>Moretti</surname>
          </string-name>
          . “Conjectures on world literature”.
          <source>NIne:w Le昀琀 Review</source>
          (
          <year>2000</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>F.</given-names>
            <surname>Moretti</surname>
          </string-name>
          .
          <article-title>Graphs, maps, trees: abstract models for literary history</article-title>
          . London New York: Verso,
          <year>2007</year>
          . 119 pp.
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>A.</given-names>
            <surname>Mukherjee</surname>
          </string-name>
          .Canonicity.
          <year>2017</year>
          . doi:
          <volume>10</volume>
          .1093/obo/9780190221911-
          <fpage>0054</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>I. G. Nils</given-names>
            <surname>Reimers</surname>
          </string-name>
          .
          <article-title>Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks</article-title>
          .
          <year>2019</year>
          . arXiv:
          <year>1908</year>
          .10084.
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>J. W.</given-names>
            <surname>Pennebaker</surname>
          </string-name>
          .
          <article-title>The secret life of pronouns: what our words say about us</article-title>
          . New York: Bloomsbury Press,
          <year>2011</year>
          . 352 pp.
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>G.</given-names>
            <surname>Pollock.</surname>
          </string-name>
          <article-title>Di昀erencing the canon: feminist desire and the writing of art's histories</article-title>
          . Re visions. London ; New York: Routledge,
          <year>1999</year>
          . 345 pp.
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>O.</given-names>
            <surname>Seminck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Gambette</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Legallois</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Poibeau</surname>
          </string-name>
          . “
          <article-title>The Evolution of the Idiolect over the Lifetime: A Quantitative and Qualitative Study of French 19th Century Literature”</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          <source>In: Journal of Cultural Analytics 7.3</source>
          (
          <year>2022</year>
          ). doi:
          <volume>10</volume>
          .22148/001c.
          <fpage>37588</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tynianov</surname>
          </string-name>
          . “On Literary Evolution (
          <year>1927</year>
          )
          <article-title>”</article-title>
          . InP:ermanent Evolution. Boston, USA: Academic Studies Press,
          <year>2019</year>
          , pp.
          <fpage>267</fpage>
          -
          <lpage>282</lpage>
          . doi:
          <volume>10</volume>
          .1515/
          <fpage>9781644690635</fpage>
          -
          <lpage>015</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>T.</given-names>
            <surname>Underwood</surname>
          </string-name>
          .
          <article-title>Distant horizons: digital evidence and literary change</article-title>
          . Chicago: The University of Chicago Press,
          <year>2019</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>33</lpage>
          . 206 pp.
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>T.</given-names>
            <surname>Underwood</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kiley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Shang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Vaisey</surname>
          </string-name>
          . “
          <article-title>Cohort Succession Explains Most Change in Literary Culture”</article-title>
          .
          <source>InS:ociological Science</source>
          <volume>9</volume>
          (
          <year>2022</year>
          ), pp.
          <fpage>184</fpage>
          -
          <lpage>205</lpage>
          . doi:
          <volume>10</volume>
          .15195 /v9.a8.
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>T.</given-names>
            <surname>Underwood</surname>
          </string-name>
          and
          <string-name>
            <surname>J. Sellers. “</surname>
          </string-name>
          <article-title>The ”Longue Durée” of Literary Prestige”</article-title>
          .
          <source>MInod:ern Language Quarterly 77.3</source>
          (
          <issue>2016</issue>
          ), pp.
          <fpage>321</fpage>
          -
          <lpage>344</lpage>
          . doi:
          <volume>10</volume>
          .1215/
          <fpage>00267929</fpage>
          -
          <lpage>3570634</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          <string-name>
            <surname>A.</surname>
          </string-name>
          <year>2</year>
          .2. Coreference resolution evaluation:
          <volume>76</volume>
          .
          <fpage>4</fpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>