<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Make Love or War? Monitoring the Thematic Evolution of Medieval French Narratives</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jean-Baptiste Camps</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>NicolasBaumard</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pierre-CarLlanglais</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>OlivierMorin</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>ThibaultClérice</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jade Norind r</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>ALMAnaCH - Inria</institution>
          ,
          <addr-line>2 Rue Simone IFF, 75012 Paris</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>École nationale des chartes - Université PSL</institution>
          ,
          <addr-line>65 rue de Richelieu, Paris, 75012</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>OpSci</institution>
          ,
          <addr-line>3 rue de Milan, Paris, 75009</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <fpage>734</fpage>
      <lpage>756</lpage>
      <abstract>
        <p>In this paper, we test a famous conjecture in literary history put forward by Seignobos and de Rougemont according to which the French central medieval period (12-13th centuries) is characterized by an important increase in the cultural importance of love. To do that, we focus on the large and culturally important body of manuscripts containing medieval French long narrative 昀椀ctions, in particular epics (chansons de geste, of the Matter of France) and romances (chie昀氀yromans on the Matters of Britain and of Rome), both in verse and in prose, from the 12th to the 15th century. We introduce the largest available corpus of these texts, thCeorpus of Medieval French Epics and Romances, composed of digitised manuscripts drawn fromGallica, and processed through layout analysis and handwritten text recognition. We then use semantic representations based on embeddings to monitor the place given to love and violence in this corpus, through time. We observe that themes (such as the relation between love and death) and emblematic works well identi昀椀ed by literary history do indeed play a central part in the representation of love in the corpus, but our modelling also points to the characteristic nature of more overlooked works. Variation in time seems to show that there is indeed an phase of expansion of love in these 昀椀ctions, in the 13th and early 14th century, followed by a period of contraction, that seem to correlate with the Crisis of the Late Middle Ages.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Medieval French Literature</kwd>
        <kwd>Cultural Evolution</kwd>
        <kwd>History of Emotions</kwd>
        <kwd>Document Analysis and Recognition</kwd>
        <kwd>HTR</kwd>
        <kwd>Word and Document Embedding</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction: love and war in medieval French narratives</title>
      <sec id="sec-1-1">
        <title>1.1. Love, a ‘medieval invention’…</title>
        <p>
          Love – or more precisely love in literature – is sometimes depicted as “a medieval invention”,
or rather, to quote the exact formulation of this phrase that goes back to the historian Charles
Seignobos, “Love dates from the 12th century”35[]1. What Seignobos meant is not that there
was in Antiquity no conception of love, but he di昀erentiates the antique notion Eorfos,
interpreted as sexualdesire, at least for males (he admits the idea of love-as-respectful-devotion in
Antique women) from the modern (and in his mind, Western only) notion of reciprocal love,
that he de昀椀nes as “a new feeling of respect and reciprocal admiration, supposing equality
between the two sexes” [
          <xref ref-type="bibr" rid="ref35">35</xref>
          ]. This conception would 昀椀nd its origins in the 12th century “courtly
love”. The cultural movement of courtly love, 昀椀nth’aemor , started in Southern France in the
lyrical poetry of thtreoubadours around the start of the 12th century as far as we can judge, and
knew a spectacular expansion spanning Western Europe and fecundating other types of
literary productions, such as the new form of throemans (romance), and even epic narrative forms,
until then more preoccupied with violence, lineage and feudal values, such acshathnesons de
geste, eventually blurring the frontier between the two genres.
        </p>
        <p>
          In his masterwork,L’Amour et l’Occident (Love and the Western World) [
          <xref ref-type="bibr" rid="ref33">33</xref>
          ], Denis de
Rougemont makes of the myth of Tristan et Iseut the archetype and the most emblematic early
representative of the love-as-passion in Western culture, of which some signi昀椀cant features are
the link between love and death, the unsatis昀椀ed, frustrated or fatal issue of a yet reciprocal
sentiment, the transgression of moral norms or social duties ainnd昀椀n,e , the adulterous nature
of the relationship.
        </p>
        <p>In addition, de Rougemont also draws a correspondence between two apparently
antagonistic themes: love and war. Noting the use of military vocabulary in the depiction of the conquest
of the loved lady, to which the lover must lay siege, a昀琀er he has been struck by the arrows of
Love, he argues that both (courtly) love and (gallant) war are realised in the same chivalric
ideals (“La chevalerie, loi de l’amour et de la guerre”: chivalry, law of love and war).</p>
        <p>
          De Rougemont’s work has received some criticism, because there is no unique de昀椀nition of
love in the Middle Ages, and that the re昀椀ned love昀椀n(’amor ) of the lyrical poet has substantial
di昀erences with the passionate ‘crazy’ love of Tristan and Iseufto(l’amor) [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ], and no medieval
story 昀椀ts all aspects of courtly love as a deliberate choice (henceforth, adulterous in nature) and
a reciprocal sentiment: the troubadours most o昀琀en complain of the disdain or excessive pride
of the lady they love (hence, not always recriprocal), while Tristan and Iseut
magical-昀椀lterinduced love does not perfectly 昀椀t the idea of a deliberate choice. Despite these reservation,
and as far as the surviving documentation allows us to see, the 12th and 13th centuries saw an
explosion of 昀椀ctional love stories, both of known writers such as Beroul, Chrétien de Troyes or
Marie de France for instance, or in the many anonymous works of this period, sucFhloairse and
Blanche昀氀ore orAucassin and Nicolette [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. If these were written 昀椀rst in Occitan and French – be
it Continental French, Anglo-French, or Franco-Italian –, equivalent in other Western European
1The more catchy “Love is a modern invention”, Seignobos explains, was a corrupted version of what he said to a
lady, who told the journalist Gustave Téry, who told a colleague, Henri Bellamy, who in turn published it in the
Quotidien, spurring Seignobos reply.
languages were soon to appear, perhaps with the exception of Spain where the 昀椀rst examples of
symmetric and passionate narrative love story arrive later in the 14th centu2r].yF[or instance,
in addition to the French version of thTeristan and Iseut story by Beroul, Thomas of Britain,
Marie de France and Chrétien de Troyes, we also see German (Gottfried von Strassburg), Italian
(Tristano Riccardiano, Tristano Veneto and Tristano Corsiniano), English (Sir Tristrem) and Czech
(Tristam and Izalda) versions, and slightly later in Spanish from the late 14th c. to the 162t]h. [
As Morris notes: “As far as our surviving evidence takes us, there was an enormous explosion
of interest in the subject shortly before 1100. An almost complete silence was followed by the
beginning of love literature which challenged in quality and surpassed in volume that of any
earlier civilization2”9[
          <xref ref-type="bibr" rid="ref2">, 2</xref>
          ].
        </p>
        <p>The 14th and 15th century constitutes a period somewhat less explored with respect to
Medieval French literature, despite – or probably because of – a large body of prose works, very
often of consequent dimensions, and sometimes with abundant surviving manuscript traditions.
Many of these works were new versions of previous texts, such as new versionTsroisftan
and Iseut or Floris and Blanche昀氀ore , and they are accompanied by many seemingly new
creations such asPonthus et Sidoine orCleriadus et Meliadice, while, in other European languages,
the works of Chaucer or the Middle Dutch playsEosmf oreit orGloriant feature an important
dimension of reciprocal love as passio2n].[ Yet, to some extent, it remains to be seen if, in
Medieval narratives, the importance of the theme of love and passion actually increases or
decreases in the literature of the Late Middle Ages.</p>
      </sec>
      <sec id="sec-1-2">
        <title>1.2. … in broader perspective</title>
        <p>
          If the importance of the development of love in medieval Western culture from the 12th
century onwards cannot be denied, research now tends to put it back in a context where similar
increases happened elsewhere in Eurasia, for instance in the Arab world, India, Persia, China
and Japan, as well as in the West in other periods of time, such as the the Greece of the 昀椀rst to
third centuries AD (that saw the production of ‘novels’ suchLeauscippe and Clitophon) [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ].
        </p>
        <p>The medievalist Georges Duby was perhaps the 昀椀rst to hypothesize that economic
development might be the main driver in the rise of love in Western Europ1e4][. Recently,
Baumard, Huillery, Hya昀椀l, and Safra3[] argued that a ‘higher level of economic development’
(approached through measures such as GDP per capita, urbanisation rate, size of the largest city,
…) ‘is strongly associated with a greater incidence of love in narrative 昀椀ction’, in the Eurasian
space, both in the Antiquity and during the Middle Ages and Early Modern per3io].d [</p>
        <p>
          In line with these work, we thus test whether there are indeed shared trends between love
and economic development in literary 昀椀ctions, on the speci昀椀c corpus that played a central
cultural role in medieval Europe and spurred de Rougemont’s analysis: medieval French long
narrative 昀椀ctions. We compare it to measures of economic development, based on data
available data for GDP in medieval France32[
          <xref ref-type="bibr" rid="ref5">, 5</xref>
          ]. It is to be noted that, if both appear correlated,
it won’t necessarily mean that increase in GDP causes increase in taste for love in 昀椀ction, as
both could be in昀氀uenced by external factors that are not yet included in our analysis, such as,
for instance, political stability, or the absence of major shocks (wars, pandemics,…).
        </p>
        <p>To go beyond the received (and relatively cramped) literary canon, in terms of works,
authors, genres and periods, and to proceed on the material basis of the reception and popularity
of the works, we build a large corpus of manuscriptschoafnsons de geste as well as verse
and proseromans, from the 12th to the 15th century. By working on the basis of the
surviving manuscripts, we hope to circumvent some biases due to, both, the romantic and scholarly
reception of medieval works in the 19th-21th centuries, such as the overvaluation of works
椀昀tting contemporary aesthetic criteria (rather than popular during the Middle Ages), but we
are subject to biases in the di昀erential preservation and destruction of these wor2k2s, 8[].</p>
        <p>By including both epicsc(hansons de geste) and romans, we will also be able to go beyond
traditional genre de昀椀nitions to monitor the importance of the theme of love, and its ability to
cross generic boundaries.</p>
        <p>Finally, to give context to the evolution of the theme of love, and to test the hypothesis of
de Rougemont of a link between them, we will also follow the chivalric theme of violence and
war.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Materials and Methods</title>
      <sec id="sec-2-1">
        <title>2.1. Global design and justification</title>
        <p>Our global experimental design is as follow:
1. gather a corpus as large as possible of manuscripts of medieval French epics and
romances, through the harvesting of digitised manuscripts and their subsequent processing
through a dedicated work昀氀ow using computer vision and natural language processing;
2. build a semantic representation of words and documents, based on a joint embedding,
using doc2vec, and estimate its quality using literary knowledge;
3. compute scores for documents, based on cosine similarity in the joint embedding
between them and the vectors of words for love, and for violence;
4. monitor their variation during the period;
5. compare the variation with historical knowledge on economic development: do they
converge, diverge or seem unrelated?
6. (appendix) use top2vec, based on the doc2vec embedding, to look at the topics with high
love or violence scores, to check that they are indeed related to courtly love, and to
violence, by interpreting them in light of literary history.</p>
        <p>As there is no unequivocal de昀椀nition of what constitutes the theme of love or of violence,
we choose to constitute them by concatenating the vectorised word representations of:
• the lexemesaimer and amour, and their in昀氀ectional and spelling variants that we could
identify2;
• the lexeme ferir (frapper, hit) and its in昀氀ectional and spelling variants that we could
identify. ferir was chosen because it constitutes probably the most ubiquitous verb in
descriptions of 昀椀ghts [19, p. 149].
2We do not target speci昀椀cally physical love and sexuality (and important texts in that regard, such asFtahbeliaux,
are indeed not included in our corpus), though these themes might still have appeared, as well as forms of non
romantic love (e.g., divine, familial, etc.). Yet, the results below show that our vector captures almost exclusively
courtly love themes.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Corpus</title>
        <p>2.2.1. Scope
The Corpus of Medieval French Epics and Romances introduced in this paper is, to our knowledge,
the largest corpus of Medieval French created until now. Though still in early version and with
only partial coverage of its 昀椀nal scope, it is as of now comprised of 265 manuscripts, and 410
text witnesses, for a total of 38.5 million word tokens. The deep learning based work昀氀ow for
text acquisition from the digitised manuscripts images, as well as the subsequent ground-truth
free quality evaluation of the results are depicteAdpipnendix A.</p>
        <p>The goal of the corpus is to encompass every manuscript of medieval French long narrative
works, that fall broadly in the categorycohafnsons de geste (epics) and chivalricromans
(romances), chie昀氀y but not exclusively from the Matters of Rome or Britain, in verses, along with
theirmises en prose and native prose versions, as long as they are available in digitised form.</p>
        <p>For this paper, the scope was limited to digitisations available as part ofGthalelica digital
library of the Bibliothèque nationale de France. In the near future, it will be expanded to include
other sources (such as theBibliothèque Virtuelle des Manuscrits Médiévaux)3.</p>
        <p>
          The inventory of works, texts and manuscripts (still ongoing) was made by collating a list of
epics made by one author, data from thOepenStemmata repository 7[], with the list published
by Kestemont et al. [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ]. Veri昀椀cations were made by going back to the digital catalog of the
BnF [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], and online databasesJonas and Arlima [
          <xref ref-type="bibr" rid="ref20 ref6">20, 6</xref>
          ]. In particular, data was enriched with
links to available digitisations.
        </p>
        <sec id="sec-2-2-1">
          <title>2.2.2. Corpus facts and figures</title>
          <p>Main statistics about the corpus are presented iTnable 1. Due to the unequal availability of
digitised manuscripts (and of the underlying sources), as well as selection on the basis of
handwritten text recognition (HTR) quality, the corpus is not chronologically balanced, and no
regularisation was performed on this aspect (which in part supposedly re昀氀ects also variations
in manuscript production and preservation).
3The version used for this paper is limited to a subset of 258 manuscripts, due to time and computing resources
constraints. It will be expanded in the course of the following months, to encompass all digitised manuscripts
from our list of 800 manuscripts containing relevant texts and kept inBthibeliothèque nationale de France. In the
context of the preparation of this paper, we have focused on increasing the diversity of texts rather than including,
say, all the very numerous manuscripts of the prose arthurViaunlgate cycle or theGuiron le courtois.</p>
          <p>The chronological distribution of the cmfseerle-ct corpus F(igure 1, le昀琀) shows that
substantial data is available for all the period envisioned. It also shows an almost continuous increase
during the 13th century, followed by a very important decrease, hitting a lowest point in the
years following the Black Death pandemic (associated with a signi昀椀cant population drop). It
then increases again in the 15th century, with another notable drop during one of the worst
periods of the Hundred Years’ War, roughly the Armagnac–Burgundian Civil War (1407-1435).
Even though biases in the availability of sources and choices made for the corpus are likely to
be present, the correspondence with important historic events might be an argument for a form
of representativeness of the corpus with respect to the medieval production, or perhaps with
the inheritance of the Royal Library, established by Charles V the Wise in 1367 (the ancestor
of the BnF, from which digitised manuscripts were obtained).</p>
          <p>It is to be noted that the unequal distribution of tokens in time is not necessarily in itself
a problem to estimate the average importance of love, as long as enough material is present
throughout the period.</p>
          <p>The distribution of number of tokens by worFkig(ure 1, right) shows, as expected, a very
unequal distribution: works are most o昀琀en found in a single witness, but a handful of texts
have an abundant tradition, re昀氀ecting their enduring success. Some of the latter are very long
or cyclical works, and in de昀椀nitive, can amount for up to two orders of magnitude more tokens
than most works: this is in particular the case of the proTsreistan (23 witnesses and 10M tokens,
more than a fourth of the corpusG),uiron le courtois (9 witnesses and 4M tokens),L’Estoire del
Saint Graal (20 witnesses and 3M tokens), orGarin le Loherain (11 witnesses and 1M tokens).
We took the decision not to restrain the number of witnesses for a given text, because the aim
of our analysis is to re昀氀ect the reception of the texts. In a context where books are expensive
objects, commissioning the copy of a voluminous work is a signi昀椀cant choice.</p>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Semantic representation of the words and documents</title>
        <sec id="sec-2-3-1">
          <title>2.3.1. Model training</title>
          <p>
            Given the level of lexical, spelling and abbreviative variation in the corpus, as well as the noise
induced by the HTR process, and the current absence of subsequent normalisation such as
lemmatisation, we are faced with an important amount of variant forms. To deal with this, we
choose a method that is supposed to increase robustness to this type of variation, by creating a
shared embedding of words, using word2ve2c7[], and documents, with doc2vec 2[
            <xref ref-type="bibr" rid="ref5">5</xref>
            ]. In
addition, this allows us to use top2vec to extract topic vectors, to investigate the contexts in which
our queried word vectors are used, to ascertain that they do, in fact, represent occurrences of
courtly love or violence (sAeeppendix D).
          </p>
          <p>
            Given the nature of our corpus, we are chie昀氀y interested in several of the main claimed
features of these embeddings, in particular the fact that they supposedly do not need stemming or
lemmatisation, nor lists of stop words. In addition, some benchmarks have also found doc2vec
to be the most e昀케cient model over encoders, such as the Universal Sentence Encoder or BERT
Sentence Transformer,2[
            <xref ref-type="bibr" rid="ref1">1</xref>
            ] when used in contexts such as topic discovery. Last but not least,
since our main goal is to interrogate the documents based on the importance of the semantic
content related to the forms of the lexemaems er/amour and ferir, the advantage of using a
combination of word2vec, doc2vec and top2vec is that it allows us to manipulate and interrogate
shared representations of word, document and topic vectors.
          </p>
          <p>
            Given the large size of the texts, they were sampled in 15 lines fragments (resulting in 334 060
fragments). The doc2vec model was trained with mostly default hyperparameters, with
additionnal adjustements based on existing benchmarks, and the speci昀椀cs of our corpuTsa(ble 2).
In particular, regarding the number of training epochs, previous studies on Doc2Vec found the
optimal number of epochs for a fairly large corpus (in terms of document length and number
of documents) to be relatively low: for a 4.5 million words corpus, Lau and Bal2d4w] ifno[und
the optimal number of epochs to be 20, as opposed to 400 for a 0.5 million words corpus, and
the minimum frequency of a word in the corpus for inclusion to be 5 instead of 1. Curiskis et
al. [
            <xref ref-type="bibr" rid="ref13">13</xref>
            ] showed that for a dataset of approximately 7 000 documents of a mean length of 140
words, the optimal number of training epochs was 50. Since our corpus is closer to 40 million
words and 300 000 samples, respectively one and two orders of magnitude larger, we retain the
option of training for 昀椀ve epochs, with a minimum count of 25. In addition, we chose to use
negative sampling instead of a hierarchical so昀琀max step at the output layer because it proved
both more e昀케cient and yielding better quality vectors in existing benchmarks28[
            <xref ref-type="bibr" rid="ref24">, 24</xref>
            ], and
chose a vocabulary based only on word 1-grams.
          </p>
          <p>Training was made on a dedicated server, using 8 parallel workers.</p>
        </sec>
        <sec id="sec-2-3-2">
          <title>2.3.2. Interrogation of the resulting vectors</title>
          <p>Topics, texts and passages were then interrogated using the following methodology:
1. word vectors were interrogated on the basis of the lemmas ‘ametro’l(ove) and ‘amor’
(subst. love), on one hand, and ‘ferir’to( hit) on the other, to retrieve most similar words.
Other forms (昀氀exional, spelling, segmentation variants, or variant forms due to HTR
noise) of the lemmas were then identi昀椀ed, and added to the request (e.g., ‘amour’, ‘lamor’,
‘lamour’, ‘amors’, ‘amours’, ‘samor’, ‘damors’, ‘amo’, ‘amoit’, ‘lamoit’, ‘amee’, etc.),
iteratively, until the most similar stopped yielding forms of the lemmAasp p(endix C).
2. those sets of words and their corresponding vectors were used to examine their direct
environment, in terms of word vectors closest to them, as well as in terms of document
vectors closest to them (both in cosine similarity), in order to establish the semantic
contents and the nature of the works that they would retrieve, and to verify they were
concerned with courtly love, and chivalric violence. In addition to this veri昀椀cation of the
quality of the embedding, topic modelling was used more secondarily to look directly at
the closest associated themesA( ppendix D);
3. 昀椀nally, those sets of word vectors were used to compute a love and a violence score (based
on cosine similarity) for each document, and monitor the variation of this score through
time. For this, the score for love and violence of all passages was retrieved, in order
to calculate a yearly mean. This necessitated to distribute the passages chronologically
based on their date or approximate dating: for instance, a passage in a manuscript dated
to 1245 was assigned to the year 1245 with a weight of one; a passage dated to the last
quar1ter of the 13th century was assigned to the years 1276 to 1300 with a weight of
1300−1275 = 0.04 for each year. The mean scores were then computed and plotted as a
time serie, using local regression with the LOESS method (locally estimated scatterplot
smoothing), with a smoothing coe昀케cient of 0.15.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Results</title>
      <sec id="sec-3-1">
        <title>3.1. Semantic environment of love and violence</title>
        <p>In order to inspect the validity of the vectors of love and violence, and to establish to what
speci昀椀c kind of love or violence they referred, we looked at the contexts, through the interrogation
of most similar word vectors, based on cosine similarity between our love and violence vectors
(mean of word vectors that compose them) the vectors for each word in the modTealbl(e 3).
We completed it through topic modellinAgp(pendix D).</p>
        <p>The words closest to the love vector exhibits a catalogue of courtly love vocabulary: in
the designation of the lovers, in the expectations, languishing troubles and (metaphoric or not)
death from love (as well as potential love quarrels); the use of feudal vocabulary (loyalty, feudal
possession), the expression of feelings and its traditional metaphoric elements (昀椀re, heart, …),
as well as love promises, desire and kisses. We also 昀椀nd courtly qualities (beauty, goodness,
high social extraction), and their traditional incarnated opposites, be they jealous or simply not
possessed of courtly qualities (the villein and their supposed vileness or boorishness).</p>
        <p>The words closest to thfeerir (hit) vector form an even more compact vocabulary: it is about
hitting one’s opponent with o昀ensive weapons in the teeth, chest or shield, breaking pieces of
armour, slashing, cleaving, slicing, piercing through, throwing him o昀 his horse, and ultimately
killing him.</p>
        <p>Given these results, we are satis昀椀ed that the word embedding o昀ers a relevant representation
of (courtly) love and violence (especially, chivalric combats). We then move on to examine the
document embedding, on the basis of these word vectors.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Document scores for love and violence</title>
        <p>Document-level scores for love and violence were computed for each textual witness, by taking
the mean score of all passages extracted from them. If we rank them accordingly (T4a)b,lwee
notice in both lists the importance of manuscripts dating to the 13th and the early 14th century.
The list of witnesses closest to the love vector show the importance of courtly love stories, in
a mix of works whose literary importance is o昀琀en known and sometimes less so: the very
famous Lai de l’ombre, for instance, is an archetypal courtly tale by Jean Renart, in which a
knight seduces a woman that was refusing him, by gi昀琀ing a ring to her re昀氀ection in a fountain.
The list also contains several adventure and love romances centred on a couple, suAcmhaadsas
et Ydoine, Floire et Blanche昀氀or (here in its ‘aristocratic’ versionC)r,istal et Clarie. Several of these
works share an Ovidian inspiration, and narrative patterns typical of courtly love (such as the
gi昀琀 or exchange of rings). The works of Adenet le Roi also feature in good place, be it the
courtly adventure romance oCfléomadès, or theBerte aus grans piés, in which he mixes epic
sources with a 昀椀ne description of the feelings and troubles of its chief female character. Some
lesser known texts 昀椀t quite well in this list: the highest scoring one is thReoman de la poire,
a text in-between romance and lyrical poetry, in which “the themes of courtesy are present,
with sophistication and re昀椀nement pushed to the extreme” [34, our translation], that centers
around the initially non reciprocated love between the narrator and a lady, communicating
through lyrical poems, and bene昀椀ting from the mediation of allegories of Love, courtly virtues
(Loyalty, Subtle Thought, Gentle Gaze…) and characters borrowed from famous texts (Tristan
and Iseut, Pyrame and Thisbé). The 14th centuryDame à la licorne et beau chevalier au lion is a
comparatively late example, of a romance mixed with lyrical poems, in a manuscript that was
likely gi昀琀ed to the princess Blanche de Navarre at the time of her wedding with the king of
France. It is also an archetypal courtly story, in which a married young lady, accompanied by
a unicorn, falls in love with a knight accompanied by a lion, and lives a story 昀椀lled with tropes
such as the rumors of death of her lover, the slanders of the jealous against the couple, etc.</p>
        <p>The presence of the Song of Saint Alexis is seemingly a discrepancy in comparison to the
rest of the list, yet it might be explained by the centrality in the tale of the marriage of Alexis,
from which he 昀氀ees, up to the end of the narrative, that 昀椀nishes with the lamentations of his
wife before his dead body (that creates a discordant echo to the courtly ‘death from love’).</p>
        <p>On the other hand, the witnesses closest to the violence vector are chie昀氀y epichsa(nsons
de geste). For instance,Aliscans and the Chevalerie Vivien, that appear several times in the list,
are centred around the eponymous battle in Aliscans between the Sarracen king Deramé of
Cordoba and the Frank knight Vivien, who swore never to back down before the pagans, and
endures an heroic death precisely because of his vow. It is interesting to notice the presence in
this list, among more 昀椀ctional texts, of theConquest of Jerusalem, that draws on the events of
the First Crusade (in particular, the siege of Jerusalem in 1099).</p>
        <p>The nature of the documents closest to the love and violence vectors, when confronted to
existing literary knowledge, con昀椀rms the quality of the document embedding, and the ability
of our method to recognise the importance of courtly love or chivalric violence contents in the
texts.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Median scores variation in time</title>
        <p>
          Examining the variation through time of the semantic contents of the documents, year by
year, by plotting the average yearly document similarity of the samples with the vectors of
the love and violence sets of keywords, seems to yield a strong increase of the presence of
love until, roughly, the years 1330-1340, followed by a tendencial decrease, until the end of
the Middle Ages, roughly coinciding with the Crisis of the Late Middle Ages (or the Medieval
Great Depression), though not completely. A comparison with reconstructed economic data
[
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] shows that the important crises of the beginning of the 14th century, in particular the Great
Famine of 1315–1317, that coincides with a very large drop in estimated GDP per capita, does
not seem to a昀ect the importance of love in 昀椀ction, though it might have contributed to the
very signi昀椀cant drop in available manuscripts observed aboveFi(gure 1). If there seems to
have shared trends, up to a point, in long term variation of economic development and love
score (increase in the 13th century, decrease during the Crisis of the Late Middle Ages), the
comparison of the two curves do not match perfectly, and perhaps hints at a time lag of a
couple of decades in the latter. This might hint at a form of cultural inertia, especially in a
context where textual transmission (the copy and circulation of texts) is a lengthy process, and
by no means as 昀氀uid as in latter periods. It is to be noted that some of the increases of the
GDP per capita are not necessarily due to increase of GDP, but instead to sudden decreases in
population, such as the one caused by the Black Death in 1347-1351.
        </p>
        <p>Violence, on the other hand, seem to start its decrease earlier, around the middle of the
thirteenth century. This could coincide with the slow loss of favour of the genre of epics
(chansons de geste), victim of the competition of the more recent genreroofmans, as well as the
irruption in latecrhansons de geste of themes other than war: for instance individual adventure,
love or wonder.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Discussion and future work</title>
      <p>In building the corpus used for this study, we remain tributary of biases of the unequal
preservation of documents through time, of large and small scale historical events, from the Great
Plague to the ups and downs of the Royal Library, whose collections are the ancestor of those
of the BnF that we used (cf.Figure 1). In addition, since manuscripts (especially those
preserved) were expensive objects reserved to a certain elite, their contents cannot be claimed to
represent the taste of society as a whole, but rather those of relatively wealthy and educated
class (aristocratic or otherwise).</p>
      <p>Yet, within the limits of these sources, we observe that we are able to build and query a
semantic representation of the words and documents that exhibits many of the tropes of this
literature, that researchers have studied through close reading. In particular, the semantic
environment of the love word vector, both in terms of close words or documents, corroborates
and sometimes enriches literary knowledge on the tropes of courtly love and the associated
works. They align with several of de Rougemont’s ideas about the importance of the lyric
tradition, as well as the strong link between the themes of love, love induced su昀ering and
death.</p>
      <p>The variation in time of the importance of love and violence shows initially opposite trends,
that, a昀琀er c. 1340, seem to align more closely. In terms of literary history, this could correspond
to the traditional epicchansons de geste focused on collective war against sarracens and feudal
con昀氀icts slowly going out of fashion, and progressively aligning their content with the more
modern genre of theroman, including individual adventures and love stories. Once epics have
merged with chivalric romances, both seem to behave in similar ways through time.</p>
      <p>Finally, the variation in time of the mean importance of love in the 昀椀ctions seem to show a
phase of expansion until the early 14th century, when it then knows a downward trend during
the period of the Crisis of the Late Middle Ages (with a time lag of roughly 20 years, the decline
in love starting around 1330-1340, while the Great Famine of 1315-1322 traditionally marks the
beginning of the Crisis). Further research is needed to explore this issue in greater depth, and
test the correlation with economic development as well as other factors.</p>
    </sec>
    <sec id="sec-5">
      <title>Data and materials availability</title>
      <p>Data and scripts used for topic modelling are available on a Zenodo
reposi1to0.r5y2:81/zenodo.10011791. The cmfer is also available on
Githubh,ttps://github.com/Jean-BaptisteCamps/CMFER .</p>
    </sec>
    <sec id="sec-6">
      <title>A. Acquisition workflow and evaluation of the corpus</title>
      <sec id="sec-6-1">
        <title>A.1. Workflow</title>
        <p>The work昀氀ow for text acquisition is depicted in 昀椀g.3.</p>
        <p>
          Manuscripts images are harvested using the International Image Interoperability Framework
(IIIF), based on their manifest, then processed through layout analysis, using
YALT1A0i][object detection approach, and the Gallicorpora mod3e1l],[ using SegmOnto ontology for the
semantic typing of zones [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ], in combination with a Kraken 2[
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] model for the
identi昀椀cation of lines. The resulting ALTO (Analyzed Layout and Text Object)/page images pairs are
then passed to handwritten text recognition, using the deep learning approach of the Kraken
so昀琀ware, and the CREMMA Medieval Generic model 1[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. This model produces a version of
the text that encodes abbreviations as such, and follows the graphematic conventions recently
elaborated at the École des chartes, in a seminar led by A. Pinche, J.-B. Camps and F. Duval
[
          <xref ref-type="bibr" rid="ref30">30</xref>
          ].
        </p>
        <p>The resulting ALTO 昀椀les (one per page) are then processed through a dedicated script, to
create a single raw text 昀椀le per witness (i.e., an instance of a given work in a given manuscript),
with the relevant metadata in an accompanying tsv format.</p>
      </sec>
      <sec id="sec-6-2">
        <title>A.2. Quality evaluation</title>
        <p>We follow the approach recently described by Clér9i]c,efo[r ground-truth free evaluation of
handwritten text recognition (HTR) of Old French. This approach is based on natural language
processing, and aims to evaluate the apparent linguistic consistency of a text, rather than its
match with the original line image. It takes the evaluation as a classi昀椀cation task, were a model
is trained to classify transcribed lines in categories, that are supposed to approximate a level
of character error rate: Good ([0, 10)%), Acceptable ([10, 25)%), Bad ([25, 50)%), and Very Bad
(≥ 50%). For this, it uses a model based on a an embedding-sentence encoder-linear classi昀椀er
structure. It produces as an output a classi昀椀cation of each line in each of the aforementioned
categories (昀椀g. 4). We reuse the model provided by Clérice with the original paper.</p>
        <p>To provide an estimate for each textual witness, we count the total number of lines in each
categories, and compute a ratio, both for each category, but also for good and acceptable vs bad
and very bad (昀椀g. 5). median ratio of good lines is 65% and the median ratio of good+acceptable
lines is 94% (min: 9%; 1st quartile: 89%, 3rd quartile: 97%; max: 100%). Typical examples of
results for maximum, median, 昀椀rst quartile and minimum values of good+acceptable lines are
given in appendix (B).</p>
        <p>Distribution of quality estimations by century shows that our model shows comparable
levels of quality for the 13th and 14th century, with most manuscript above 80% of good and
average lines, and a few outliers below. On the other, there is a decrease in quality for the 15th
century, with also a less compact distribution. This can be explained by the signi昀椀cant number
of 15th century manuscripts written in cursive scripts, with o昀琀en less formal execution, that
di昀er signi昀椀cantly from the Gothic Textualis that otherwise dominates the corpus.</p>
        <p>Outliers with a large number of bad or very bad lines exists, and they were removed from the
corpus before further analysis. The threshold was set at 1.5 interquartile range below the 1st
quartile (ratio of good+average lin≥es0.78), resulting in 370 texts selected for further analysis
(Table 1).</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>B. Example of processing results for the di昀erent quality levels</title>
      <p>M edeffendes uꝰ dex de cest fu ci deuant
⁊ ausi ꝯtu ses ca cort ai cest cmͦãt
b iax sire deffendes cel ch̾ r enfant
Qͥ por moi se ꝯbar acel serf mal faisant
l ors chiet ariel la dame de peor uait dͤ mblãt
ꝯtt ut .i. dꝰ len relieue ꝯ apeloit climãt
s ili dist doloe dame neuos esmaies tant
U es encor elyas le m̾ chi deu niuant
A usi est ore endeu ꝯ al mienchemãt
⁊ la dame se taist qͥ ot le cuer dolant
b ien furenit .iii.e. franc a iestor ꝯmẽc̾
T uit cistenuont ensenblep᷑ .pa. domag̾
c eiorineissie dance lances froisier
⁊ noz ienge . srap. sor. sarr̾ . aid'.
a destre ⁊ a sencstre acrrãc iesrãs cera
a mot parmices hiaumes seur ⁊ chaploier
c eschieticecusaires ocirre ⁊ de crãch̃ ͤ
Seu luirent encenble .iiii.e. charpẽtier
Qi trestuit quipentataẽ ẜ chistei rericẽ
Ne seissenta mie teluoise ⁊ teitenpier
O u ses non ie uos rans ke pais
C ar ie nel puis auser uegarãtir
a uns men nai com uns autres cheas
C il sunt dolant cont lapaiole out
N ra celui qui ne fust abais.</p>
      <p>O une plorast del biaux iex de sonuis
i apostos les senest enpies leues
Cememẽt plore sa sagent apeles
S ignor clergie qͥl ꝯseil me douues
I lẽ bien orois qͥ del uostre uueces</p>
    </sec>
    <sec id="sec-8">
      <title>C. Composition of the love and violence vectors</title>
      <sec id="sec-8-1">
        <title>C.1. Love</title>
        <p>All forms of verb ‘aimer’ (to love) and noun ‘amour’ (love) that were found were used. They
are the following (forms with HTR errors are marked with an asterisk):
aime, aimme, ama, amast, ame, amee, amer, amerai, ameroit, ames, *amo, amoit,
amor, amors, amour, amours, *anier, *anne, *camoi, damor, damors, damour,
damours, desamour, iaim, iaime, iamoie, laim, laime, lamasse, lamerai, lamoit,
lamor, lamour, *laune, maime, mamast, *mor, *mour, *mours, naim, naime,
namerai, quamours, samor, samors, samour, *sanie, taime.</p>
      </sec>
      <sec id="sec-8-2">
        <title>C.2. Violence</title>
        <p>All forms of verb ‘ferir’ (to hit) that were found were used. They are the following (forms with
HTR errors are marked with an asterisk):</p>
        <p>an昀椀ert, e昀椀ert, en昀椀ert, feri, ferir, feru, ferus, 昀椀ert, leferi, referi, re昀椀ert, *uaferir</p>
      </sec>
    </sec>
    <sec id="sec-9">
      <title>D. Topic modelling</title>
      <sec id="sec-9-1">
        <title>D.1. top2vec model</title>
        <p>
          In addition to word and document embeddings, we investigated the texts using top2ve1c].[
Recent studies have shown top2vec to yield qualitatively better results, and more coherent and
human-readable topics than other topic modelling methods, such as the classic LD2A6, [
          <xref ref-type="bibr" rid="ref15 ref21">15,
21</xref>
          ]. It has been already used in large scale topic modelling of literary cor3p6o].ra [
        </p>
        <p>In addition, top2vec automatically 昀椀nds the relevant number of topics, which will facilitate
the handling of this large corpus by relieving us of doing long and computationally intensive
benchmarks of arbitrary number of topics.</p>
        <p>top2vec was trained reusing the doc2vec model described in the main text, with otherwise
default hyperparameters. Th eparameter for the dbscan clustering of topics was set to 0.1 in
cosine distance (i.e., topic vectors with a smaller cosine distance will be merged).</p>
        <p>In the lack of benchmarks dedicated to variation-rich historical corpora similar to ours, we
still conduced some degree of experimentation on the variation of these parameters (e.g., using
longer n-gram vocabulary, adjusti ng,using top2vec ‘fast-learn’, ‘deep-learn’ alternatives, etc.),
but not in a systematic fashion, due to the long training time for each model (up to 10 hours).
Experiments resulted in apparently lower quality topics, with either a excessively small number
of topics (e.g., 5 topics) or less signi昀椀cant topics with a predominance of function words.</p>
        <p>The training with the chosen parameters yielded 276 topics.</p>
        <p>
          We chose top2vec over BERTopic1[
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], due to the unavailability of a pretrained BERT model
compatible with the speci昀椀cs of our data, in terms of language and writing conventions, e.g.,
abbreviations (see next subsection) .
        </p>
      </sec>
      <sec id="sec-9-2">
        <title>D.2. Experiments with BERTopic</title>
        <p>BERTopic was another option that shared many of the strength of top2vec and performs
especially “well on most aspects of the topic modeling domai1n5”,[p. 12]. BERTopic can run on any
pretrained BERT model but is commonly associated with a multilingual pre-trained embedding
model trained on Reddit and StackExchange, paraphrase-multilingual-MiniLM-L12-v2.
Preliminary tests showed that the model is a昀ected byhistorical dri昀琀 , due to the increasing distance
between older version of French written languages and the contemporary standard: BERTopic
did run correctly on a set of 17th century French novels and to a lesser extent on a large sample
of 15th century texts from our corpus. Before the 15th century, the results were totally
inconclusive with one topic containing nearly all the corpus. The semantic map in 1昀椀g0. suggests
that sentence embeddings have deteriorated to such an extent that it is no longer possible to
recover regular clusters of topics.</p>
        <p>
          We plan to pursue experiments with this method, to be able to compare its results with
those presented here, once a pre-trained embedding model 昀椀tting our data is made available
or is trained by us. Indeed, if there exists an Old French BBerEtR,Trade [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ], but it is based on
a corpus signi昀椀cantly smaller that the data we gathered (10 million words, as opposed to 40),
and more importantly uses text editions with an higher level of normalisation (abbreviations
expanded, in particular).
        </p>
        <p>This call for a methodological remark: the rapid development of masked language models
following BERT has created a new range of issues for historical studies. Most models are trained
on very recent corpus and data. They are unlikely to cover past linguistic forms and writing
and using pre-trained models alone, it would not yet have been possible to conduct this study.</p>
      </sec>
      <sec id="sec-9-3">
        <title>D.3. Results of top2vec</title>
        <p>The six topics which scored higher for the semantic similarity with the vector of ‘love’ word
forms are shown in 昀椀g. 11. Interestingly, the two highest scoring topics both relate to the
lyrical register of the complainplta(inte) for the pains and injuries caused by love and the act
of (metaphorically) dying of love / being killed by love as they are found abundantly in our
corpus inside theTristan en prose and its many lyrical poetry inserts. The death is again found
in the 昀椀琀h topic, that concerns the songs of love in general, but also very particularly Lthaeis
of theTristan en prose, especially thelay mortel (Deadly lai), the last love song sung just before
dying of that same sentiment.</p>
        <p>In apparent strong contrast, the third topic appear dedicated to the pleasures of love, its
sweetness and the comfort it brings, through it can be closely nested with the previous one,
as is demonstrated in one of the highest scoring passages, taken from a lyrical (and possibly
parodic) part of thReomans de Fauvel, of which we give here an abstract with minor corrections
to the HTR:</p>
        <p>Q ue ie muir par tres bien amer
E n ce que urai martir serai
D ame en mourant me reconforte
[My lady please remember that] I am dying because of loving very well, that I will be
a true martyr of love, lady, this brings me comfort while I die.</p>
        <p>Other passages were this topic is most represented are found in a variety of sources, from
Amadas et Ydoine to the Roman des Sept Sages de Rome (Seven Wise masters) and its
continuations.</p>
        <p>Finally, the fourth and sixth topics are related to speci昀椀c works, tRhoemans d’Eneas and a
somewhat less known and perhaps overlooked worBkl,ancandin et l’Orgueilleuse d’Amour..</p>
        <p>Topics most related to violence concern, in an unsurprising manner, descriptions of battles
and 昀椀ghts between a knight and his enemies. They are relatively straightforward to interpret
and concern di昀erent but related aspects of knightly sword- and spear-昀椀ghting (昀椀g1.2).</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>D.</given-names>
            <surname>Angelov</surname>
          </string-name>
          . “Top2vec:
          <article-title>Distributed representations of topics”</article-title>
          . Ianr:Xiv preprint arXiv:
          <year>2008</year>
          .
          <volume>09470</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>N.</given-names>
            <surname>Baumard</surname>
          </string-name>
          . “
          <article-title>The Ancient Literary Fictions Values Survey”</article-title>
          .
          <source>IOns:</source>
          f (
          <year>2021</year>
          ). url: https: //osf.io/mvybs.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>N.</given-names>
            <surname>Baumard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Huillery</surname>
          </string-name>
          , A. Hya昀椀l, and
          <string-name>
            <surname>L. Safra. “</surname>
          </string-name>
          <article-title>The cultural evolution of love in literary history”</article-title>
          .
          <source>In:Nature Human Behaviour 6.4</source>
          (
          <issue>2022</issue>
          ), pp.
          <fpage>506</fpage>
          -
          <lpage>522</lpage>
          . doi:
          <volume>10</volume>
          .1038/s41562-022- 01292-z.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4] Bibliothèque nationale de Franccea.
          <article-title>talogue BnF Archives et manuscrits</article-title>
          . Paris,
          <year>2023</year>
          . url: https://archivesetmanuscrits.bnf.f.r/
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bolt</surname>
          </string-name>
          and
          <string-name>
            <surname>J. L. Van Zanden. “</surname>
          </string-name>
          <article-title>Maddison style estimates of the evolution of the world economy. A new 2020 update”</article-title>
          .
          <source>In:Maddison-Project Working Paper WP-15</source>
          , University of Groningen (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>L.</given-names>
            <surname>Brun</surname>
          </string-name>
          , ed.Arlima - Archives de littérature du Moyen Âge. Ottawa,
          <year>2005</year>
          . url: https://w ww.arlima.net/.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>J.-B. Camps</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Gabay</surname>
            , and
            <given-names>G. F.</given-names>
          </string-name>
          <string-name>
            <surname>Riva</surname>
          </string-name>
          . “Open Stemmata:
          <article-title>A Digital Collection of Textual Genealogies”</article-title>
          .
          <source>InE:ADH2021: Interdisciplinary Perspectives on Data, 2nd International Conference of the European Association for Digital Humanities. Krasnoyarsk</source>
          ,
          <year>2021</year>
          . url: https://halshs.archives-ouvertes.
          <source>fr/halshs-032600</source>
          .
          <fpage>86</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J.-B.</given-names>
            <surname>Camps</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Randon-Furling</surname>
          </string-name>
          .
          <article-title>“Lost Manuscripts and Extinct Texts: A Dynamic Model of Cultural Transmission”</article-title>
          .
          <source>IPnr:oceedings of the Computational Humanities Research Conference 2022 Antwerp, Belgium, December 12-14</source>
          ,
          <year>2022</year>
          . CEUR Workshop Proceedings.
          <year>2022</year>
          , pp.
          <fpage>198</fpage>
          -
          <lpage>214</lpage>
          . url:https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3290</volume>
          /long%5C%
          <fpage>5Fpaper3261</fpage>
          .pdf.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>T.</given-names>
            <surname>Clérice</surname>
          </string-name>
          . “
          <article-title>Ground-truth Free Evaluation of HTR on Old French and Latin Medieval Literary Manuscripts”</article-title>
          .
          <source>InP: roceedings of the Computational Humanities Research Conference 2022 Antwerp, Belgium, December 12-14</source>
          ,
          <year>2022</year>
          . Ed. by
          <string-name>
            <given-names>F.</given-names>
            <surname>Karsdorp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lassche</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Nielbo</surname>
          </string-name>
          . Vol.
          <volume>1613</volume>
          . CEUR Workshop Proceedings. Antwerp,
          <year>2022</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>24</lpage>
          . urhltt:ps: //ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3290</volume>
          /long%5C%
          <fpage>5Fpaper2081</fpage>
          .p.df
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>T.</given-names>
            <surname>Clérice</surname>
          </string-name>
          . “
          <article-title>You Actually Look Twice At it (YALTAi): using an object detection approach instead of region segmentation within the Kraken engine”</article-title>
          .
          <source>Ianr:Xiv preprint arXiv:2207.11230</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>T.</given-names>
            <surname>Clérice</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pinche</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Vlachou-Efstathiou</surname>
          </string-name>
          .
          <article-title>“Generic CREMMA Model for Medieval Manuscripts (Latin and Old French), 8-15th century”</article-title>
          . In: (
          <year>2023</year>
          ).
          <source>do1i0:.5281/zenodo.76 31619.</source>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Corbellari</surname>
          </string-name>
          . “Retour sur l'amour courtois”C. aInh:iers de recherches médiévales -
          <source>Journal of medieval studies 17</source>
          (
          <year>2009</year>
          ), pp.
          <fpage>375</fpage>
          -
          <lpage>385</lpage>
          . doi:
          <volume>10</volume>
          .4000/crm.11542.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>S. A.</given-names>
            <surname>Curiskis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Drake</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. R.</given-names>
            <surname>Osborn</surname>
          </string-name>
          , and
          <string-name>
            <surname>P. J. Kennedy. “</surname>
          </string-name>
          <article-title>An evaluation of document clustering and topic modelling in two online social networks: Twitter and Reddit”</article-title>
          .
          <source>In: Information Processing &amp; Management 57.2</source>
          (
          <issue>2020</issue>
          ), p.
          <fpage>102034</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>G.</given-names>
            <surname>Duby. Mâle Moyen Âge: de l</surname>
          </string-name>
          <article-title>'amour et autres essais</article-title>
          . Paris, France: Flammarion,
          <year>1987</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>R.</given-names>
            <surname>Egger</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Yu</surname>
          </string-name>
          .
          <article-title>“A topic modeling comparison between lda, nmf, top2vec, and bertopic to demystify twitter posts”</article-title>
          .
          <source>In:Frontiers in sociology 7</source>
          (
          <year>2022</year>
          ), p.
          <fpage>886498</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>S.</given-names>
            <surname>Gabay</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.-B. Camps</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Pinche</surname>
            , and
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Jahan</surname>
          </string-name>
          . “
          <article-title>SegmOnto: common vocabulary and practices for analysing the layout of manuscripts (and more)”</article-title>
          .
          <source>1Isnt:International Workshop on Computational Paleography (IWCP ICDAR</source>
          <year>2021</year>
          ).
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>L.</given-names>
            <surname>Grobol</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Regnault</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. O.</given-names>
            <surname>Suarez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Sagot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Romary</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B.</given-names>
            <surname>Crabbé</surname>
          </string-name>
          . “
          <article-title>BERTrade: Using Contextual Embeddings to Parse Old French”</article-title>
          .
          <source>In13:th Language Resources and Evaluation Conference</source>
          .
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>M.</given-names>
            <surname>Grootendorst</surname>
          </string-name>
          . “BERTopic:
          <article-title>Neural topic modeling with a class-based TF-IDF procedure”</article-title>
          .
          <source>In:arXiv preprint arXiv:2203.05794</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>L. Ing. “</surname>
          </string-name>
          <article-title>L'obsolescence lexicale en français médiéval: Philologie et linguistique computationnelles sur le Lancelot en prose”</article-title>
          .
          <source>PhD thesis</source>
          .
          <source>Université Paris Sciences et Lettres</source>
          ,
          <year>2023</year>
          . url: https://www.theses.
          <source>fr/s22111 4.</source>
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Irht</surname>
          </string-name>
          .Jonas:
          <article-title>Répertoire des textes et des manuscrits médiévaux d'oc et d'oıl̈</article-title>
          . Paris et Orléans,
          <year>2023</year>
          . url: http://jonas.irht.cnrs.f r./
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>B.</given-names>
            <surname>Karas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Qu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Q.</given-names>
            <surname>Zhu</surname>
          </string-name>
          . “
          <article-title>Experiments with LDA and Top2Vec for embedded topic discovery on social media data-A case study of cystic 昀椀brosis”</article-title>
          .
          <source>IFnr:ontiers in Arti昀椀cial Intelligence</source>
          <volume>5</volume>
          (
          <year>2022</year>
          ), p.
          <fpage>948313</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>M.</given-names>
            <surname>Kestemont</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Karsdorp</surname>
          </string-name>
          , E. de Bruijn,
          <string-name>
            <given-names>M.</given-names>
            <surname>Driscoll</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. A.</given-names>
            <surname>Kapitan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. Ó</given-names>
            <surname>Macháin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Sawyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Sleiderink</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Chao</surname>
          </string-name>
          . “
          <article-title>Forgotten books: The application of unseen species models to the survival of culture”</article-title>
          .
          <source>SInci:ence 375.6582</source>
          (
          <year>2022</year>
          ), pp.
          <fpage>765</fpage>
          -
          <lpage>769</lpage>
          . doi:
          <volume>10</volume>
          .1126 /science.abl7655.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>B.</given-names>
            <surname>Kiessling</surname>
          </string-name>
          . “
          <article-title>Kraken - an Universal Text Recognizer for the Humanities”</article-title>
          .
          <source>DInig:ital Humanities Conference</source>
          <year>2019</year>
          , Complexities,
          <source>Utrecht (DH2019)</source>
          .
          <year>2019</year>
          . url: https://web.arch ive.org/web/20210719115330/https://dev.clariah.nl/files/dh2019/boa/0673.h.tml
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>J. H.</given-names>
            <surname>Lau</surname>
          </string-name>
          and
          <string-name>
            <surname>T. Baldwin. “</surname>
          </string-name>
          <article-title>An Empirical Evaluation of doc2vec with Practical Insights into Document Embedding Generation”</article-title>
          .
          <source>In:Proceedings of the 1st Workshop on Representation Learning for NLP</source>
          .
          <year>2016</year>
          , pp.
          <fpage>78</fpage>
          -
          <lpage>86</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Le</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.</given-names>
            <surname>Mikolov</surname>
          </string-name>
          . “
          <article-title>Distributed representations of sentences and documents”</article-title>
          .
          <source>InIn-: ternational conference on machine learning. Pmlr</source>
          .
          <year>2014</year>
          , pp.
          <fpage>1188</fpage>
          -
          <lpage>1196</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>P.</given-names>
            <surname>Ma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Zeng-Treitler</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S. J.</given-names>
            <surname>Nelson</surname>
          </string-name>
          . “
          <article-title>Use of two topic modeling methods to investigate covid vaccine hesitancy”</article-title>
          .
          <source>InI:nt. Conf. ICT Soc. Hum. Beings</source>
          . Vol.
          <volume>384</volume>
          .
          <year>2021</year>
          , pp.
          <fpage>221</fpage>
          -
          <lpage>226</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>T.</given-names>
            <surname>Mikolov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Chen</surname>
          </string-name>
          , G. Corrado, and
          <string-name>
            <given-names>J.</given-names>
            <surname>Dean</surname>
          </string-name>
          . “
          <article-title>E昀케cient estimation of word representations in vector space”</article-title>
          .
          <source>Ina:rXiv preprint arXiv:1301.3781</source>
          (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>T.</given-names>
            <surname>Mikolov</surname>
          </string-name>
          , I. Sutskever,
          <string-name>
            <given-names>K.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. S.</given-names>
            <surname>Corrado</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Dean</surname>
          </string-name>
          . “
          <article-title>Distributed representations of words and phrases and their compositionality”</article-title>
          .
          <source>IAnd:vances in neural information processing systems</source>
          <volume>26</volume>
          (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>C.</given-names>
            <surname>Morris</surname>
          </string-name>
          .
          <article-title>The discovery of the individual</article-title>
          ,
          <volume>1050</volume>
          -
          <fpage>1200</fpage>
          . Vol.
          <volume>5</volume>
          . Toronto: University of Toronto Press,
          <year>1987</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>A.</given-names>
            <surname>Pinche</surname>
          </string-name>
          , ed.Guide de
          <article-title>transcription pour les manuscrits du Xe au XVe siècle</article-title>
          . Paris,
          <year>2022</year>
          . url: https://hal.science/hal-03697382./
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>A.</given-names>
            <surname>Pinche</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Christensen</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Gabay</surname>
          </string-name>
          . “
          <article-title>Between automatic and manual encoding”</article-title>
          . In:
          <article-title>TEI 2022 conference: Text as data</article-title>
          .
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <surname>L. Ridol昀椀.</surname>
          </string-name>
          “
          <article-title>The French economy in the longue durée: A study on real wages, working days and economic performance from Louis IX to the Revolution (1250-1789)”</article-title>
          .
          <source>PhD thesis</source>
          .
          <source>IMT School for Advanced Studies, Lucca</source>
          ,
          <year>2016</year>
          . urhlt:tp://e-theses.imtlucca.it/211 /1/Ridolfi%5C%
          <fpage>5Fphdthesis</fpage>
          .pd. f
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <surname>D. de Rougemont. L'Amour et l'Occident. Republ</surname>
          </string-name>
          .
          <source>Online in Rougemont 2.0 (Genève)</source>
          . Paris,
          <year>1939</year>
          . url:https://www.unige.ch/rougemont/livres/ddr1939.ao
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>C.</given-names>
            <surname>Ruby</surname>
          </string-name>
          . “
          <article-title>Thibaut”</article-title>
          . In:
          <article-title>Dictionnaire des lettres françaises: Le Moyen Âge</article-title>
          . Paris,
          <year>1992</year>
          , pp.
          <fpage>1422</fpage>
          -
          <lpage>1423</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>C.</given-names>
            <surname>Seignobos. “L'Amour</surname>
          </string-name>
          est-il une invention moderne ?” InL:e Quotidien (
          <year>1925</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [36]
          <string-name>
            <surname>J. Van Zundert</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Koolen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Neugarten</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Boot</surname>
            ,
            <given-names>W. Van Hage</given-names>
          </string-name>
          , and
          <string-name>
            <given-names>O.</given-names>
            <surname>Mussmann</surname>
          </string-name>
          . “
          <source>What Do We Talk About When We Talk About Topic?” InP:roceedings of the Computational Humanities Research Conference 2022 Antwerp, Belgium, December 12-14</source>
          ,
          <year>2022</year>
          . CEUR Workshop Proceedings. Antwerp,
          <year>2022</year>
          , pp.
          <fpage>398</fpage>
          -
          <lpage>410</lpage>
          . urlh: ttps://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3290</volume>
          /shor t%
          <source>5C%5Fpaper5533.pdf.</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>