<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Workshop on Computational Humanities Research, November</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>A Statistical Foray into Contextual Aspects of Intertextuality</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Enrique Manjavacas</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Folgert Karsdorp</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mike Kestemont</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Literature, University of Antwerp</institution>
          ,
          <addr-line>Antwerp</addr-line>
          ,
          <country country="BE">Belgium</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Royal Netherlands Academy of Arts and Sciences, Meertens Institute</institution>
          ,
          <addr-line>Amsterdam</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <volume>1</volume>
      <fpage>8</fpage>
      <lpage>20</lpage>
      <abstract>
        <p>Intertextuality is a highly productive concept in literary theory. The pervasiveness of intertextuality in literary texts has lead simultaneously to a proliferation of applications with often divergent interpretations of the concept of intertextuality, as well as a recurrent interest in studying it from a computational point of view. Despite the potential of data-driven, bottom-up approaches, most computational research into intertextuality has focused on the matter of text reuse detection, exploiting surface-level properties to improve the performance of retrieval systems. In the present study, we utilize the Patrologia Latina - a substantial collection of religious texts spanning over a millennium of Latin writing (3rd to 13th centuries) - to provide a large-scale systematic study of biblical intertexts. On the basis of multi-level statistical models, we investigate two axes of intertexts: the degree of lexical similarity, and the degree to which intertexts are thematically embedded in the context. Furthermore, we investigate the extent to which the following contextual sources of variation help explain the distribution of intertexts along the aforementioned axes: first, we analyze the efect of authorship: do authors difer in the way they compose their intertexts? Secondly, we inspect factors related to the source collection (i.e. the Bible) to elucidate whether the authority and tradition of particular books exert an influence on the observed intertexts: do certain books trigger a more allusive or quotational intertext type? Finally, we take into account the dominant topic surrounding the intertext location and examine associations between the distribution of dominant topics and intertext types. On the one hand, our analysis indicates that both axes (lexical similarity and thematic embedding) play partially complementary roles in our computational account of intertextual types. On the other hand, we find that biblical books and, more strongly, dominant topics constitute important factors of variation, while the authorial signal remains comparatively weak.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Intertextuality</kwd>
        <kwd>Text Reuse</kwd>
        <kwd>Multi-level Modeling</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Intertextuality is a well-known concept from literary studies that is commonly applied to texts
across various periods and languages [
        <xref ref-type="bibr" rid="ref2 ref37">34, 2</xref>
        ]. Originally proposed by post-structuralist literary
theorist, Julia Kristeva [
        <xref ref-type="bibr" rid="ref27">24</xref>
        ], intertextuality models literature as an intricate network of textual
nodes that are interconnected by the ‘intertexts’ that they share. Texts can refer to one another,
for instance, through the literal integration of quotes from other works or through the inclusion
of more subtle allusions to other texts. There is widespread agreement in literary studies that
the intertextual approach has considerable merit, as it sheds light on how texts participate
in the discursive space of a culture [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. In computational literary studies, intertextuality has
also received ample attention, and the vast scope at which intertextuality can be studied has
rendered the application of computational techniques very attractive from early on.
      </p>
      <p>
        In spite of the considerable popularity of intertextuality in literary studies, there exists
no straightforward definition of it [
        <xref ref-type="bibr" rid="ref36">33</xref>
        ]. Instead, a more fruitful discussion of intertextuality
can be obtained by focusing on the aspects of intertextuality that scholars have exploited to
generate new readings and interpretations of literary works. These aspects range from abstract
structuring roles, in which an original text serves as organizational principle in the creation
of another (e.g., the role of the Odyssee in Virgil’s Aeneis or Joyce’s Ulysses—cases of what
Genette terms “hypertextuality” [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]), to more localized phenomena such as motifs or allusions,
in which the link is established from and to specific passages.
      </p>
      <p>
        In order to situate computational approaches to intertextuality within this spectrum, Forstall
and Scheirer [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] introduced a useful distinction between large-scale efects and local efects
of intertextuality, referring the latter to the scope of what they call “quantitative
intertextuality”. These localized intertextual links—or “loci similes” in more traditional terms—have
been categorized along diferent axes such as intentionality [
        <xref ref-type="bibr" rid="ref15 ref9">15, 23, 9</xref>
        ] –, function – parodic
vs. satirical and non-satirical [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] –, or “literality” – quotation vs. mention or allusion. This
taxonomic activity has lead to a considerable amount of intertext typologies, highlighting the
complexity of the underlying phenomena.
      </p>
      <p>
        Still, when considering such “loci similes”, the bulk of computational studies so far have
adopted a fairly narrow conception of the phenomenon, focusing on the issue of “text reuse
detection”, and relying on techniques that exploit string similarity [
        <xref ref-type="bibr" rid="ref28 ref46 ref6 ref8">6, 25, 8, 43</xref>
        ].1 However, a
variety of contextual factors can be easily thought of as conditioning the location, source and
type of an intertextual link.
      </p>
      <p>With no aim of exhaustiveness, it could be hypothesized that certain themes (e.g. “war” or
“love”) may be more likely than others to “trigger” references, perhaps because the author’s
conceptualization of that theme is indebted to a particular source. In that sense, the location
of an intertext would be conditioned by its embedding in the triggering theme.</p>
      <p>
        Moreover, writers may show preferences to borrow from particular authors, books or
fragments of books. On the one hand, the influence of a particular source on a community of
authors can explain the frequency of references to that particular source, due to, for instance,
social biases, such as ‘conformist’ or ‘anti-conformist’ biases towards or against popular writers
(see, for instance, recent literature from the field of Cultural Evolution [
        <xref ref-type="bibr" rid="ref1 ref11 ref34">31, 1, 11</xref>
        ].) On the
other hand, the distribution of intertext types, considering, for instance, an axis of “literality”
going from literal quotation to allusive reference, may be afected by mentioned influence: a
particular source may exert an authoritative pressure towards a more literal style.
      </p>
      <p>
        Furthermore, the type of reference that can be expected in a particular text may be a
feature of authorial style. In this respect, we could expect to observe trends towards more or
less allusive referencing as a marker of authorial preference. Besides the degree of “literality”,
which is easily quantifiable in terms of lexical overlap, we need to consider a further aspect
of referential style which is easily overlooked: the extent to which an intertextual unit is
1There are certainly exceptions. For example, Bamman and Crane [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] exploits syntactic information
(dependency paths and word order) to extract allusions in classical Latin literature, Scheirer et al. [
        <xref ref-type="bibr" rid="ref43">40</xref>
        ] use Latent
Semantic Indexing [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] to extract parallels in Latin epic, Lund et al. [
        <xref ref-type="bibr" rid="ref30">27</xref>
        ] uses local topical information
extracted from anchor-based topic models to extract intra-biblical references, and Manjavacas et al. [
        <xref ref-type="bibr" rid="ref32">29</xref>
        ] examine
the application of distributional semantics to help improve the detection of allusions.
“prepared” by the textual context. If the textual contexts around the borrowing and the
borrowed passage are handling similar themes, the intertextual link could be explained as
having been facilitated by the theme similarity. A possible hypothesis in this regard is that
shorter and more subtle allusions would necessitate a higher degree of contextual similarity
with respect to the source passage to exist, because in the absence of such topical preparation,
the audience would be more likely to miss the link. However, such a hypothesis relies on
the problematic assumption that intertextual linking must be a conscious act of the writer
to be perceived as such by the reader. Instead of top-down approaches to intertextuality, as
the one implied in the previous hypothesis, we would like to systematically investigate factors
of variation that influence the type of intertext in a bottom-up fashion and considering both
axes: i.e. the degree of “literality” (quotational vs. allusive) and its embedding in the thematic
context.
      </p>
      <p>Thus, in the current study we take a step back from the problem of retrieving local
intertexts and present a quantitative analysis of the role of contextual factors on the placement of
intertexts—authorship, the impact of the source or referenced collection and the context theme.
We make use of the Patrologia Latina (henceforth: Patrology), which is a large-scale corpus
comprising large number of authors and books, and known to be abounding in intertextual
links. Two facts about the Patrology are worth advancing (the corpus will be thoroughly
introduced in Section 2): on the one hand, the majority of authors form part of the same writing
tradition sharing themes, concerns and theoretical background, which makes them
commensurable from a statistical point of view. On the other hand, the main source of reference, the
Bible, is shared. These two aspects will allow us to approach some of the alluded questions
from a data-driven perspective.</p>
      <sec id="sec-1-1">
        <title>Research Questions</title>
        <p>The research questions that we pursue in the present study are as follows:
1. Besides lexical similarity, does the thematic embedding of intertexts into their context
represent an additional axis of meaningful variation?
2. As intertextual links vary along a continuum from more to less literal as well as in the
degree to which they are thematically embedded in the topical context, do we observe
systematic variation across authors?
3. What is the efect of tradition or authority on the referencing style of the considered
authors? More specifically, do certain books of the Bible trigger particular types of
reference? Does the structure of the source collection (i.e. the Bible in the present case)
help explain such variation?
4. Besides authorship, do specific topics help further explain the type of reference and their
topical embedding?
Outline of the paper The remaining of the present paper is structured as follows. First,
Section 2 contains a description of the data sources underlying the study, as well as the
preprocessing applied in order to produced text amenable to quantitative analysis. Next, Section 3
describes the computational approach used to operationalize the theoretical categories that the
study targets: the type of reference along the quotation-allusion axis and the theme similarity
with respect to the source passage. Next, in Section 4 we describe the statistical models used
to approach the posited questions. Finally, in Section 5, we discuss the insights that can be
drawn from the models and the answers that they deliver.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Dataset</title>
      <sec id="sec-2-1">
        <title>2.1. Sources</title>
        <p>
          The main dataset used in the present paper has been compiled on the basis of the Patrology, an
extensive collection of editions of Latin writings, attributed to the so-called “Church Fathers”
in the christian tradition, as well as a number of other influential ecclesiastical authors. This
monumental endeavour was initially undertaken by J.P. Migne between 1841 and 1855 [
          <xref ref-type="bibr" rid="ref35">32</xref>
          ]. The
diachrony of this collection covers a reasonably balanced sample of more than a millennium
of written text production, ranging from the oeuvre of Tertullian (2rd century ad) to that
of Pope Innocent iii (13th century ad). This resource moreover continues to be relevant in
literary scholarship, not only because for many of the included works Migne’s constitutes the
most recent edition.
        </p>
        <p>
          Despite the diverse origins of its source materials, the Patrology can be argued to represent
a coherent corpus of religious Latin writings, mainly covering the period from late antiquity
until the high medieval period. This period coincides with the rise of Christianity, which
would become the dominant religion throughout Europe by the reign of Charlemagne. The
dissemination of the Bible (or rather: that of its individual books, which often still circulated
individually) played a major role of support in these developments. Biblical intertextuality [
          <xref ref-type="bibr" rid="ref36">33</xref>
          ],
in particular, pervades the Patrology’s texts. This is partly due to the considerable number of
sermons included (which departed from or even revolved around specific biblical quotations),
but also because various aspects of medieval exegesis crucially depended on intertextual
phenomena. One of the standard ways to understand the medieval Bible, for instance, was through
an analogical understanding of the parallels between the Old and New Testament, also at the
textual level. Therefore, it does not come as a surprise that we are not the first to use this
data to study intertextuality using computational means [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ].
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Curation</title>
        <p>
          The digital version of the Patrology was extracted from the Corpus Corporum collection [
          <xref ref-type="bibr" rid="ref41">38</xref>
          ],
which ofers high-quality OCR from Migne’s 1853 edition in a convenient XML format. On the
side of the source of the references, the Bible, we used the version of the Vulgate provided
by the Perseus Digital Library [
          <xref ref-type="bibr" rid="ref10 ref26">10</xref>
          ]. We kept the original structure of the Vulgate into verse,
chapter and book as metadata, and added to each verse a tag indicating whether the verse is
part of the Old or the New Testament.
        </p>
        <sec id="sec-2-2-1">
          <title>2.2.1. Gold Standard</title>
          <p>While the OCR’d documents from the Corpus Corporum do not include the biblical references
as part of the XML markup, as shown in Listing 1 these have been kept in its original inline
form, and can be extracted automatically through costumary data-wrangling techniques2.
1 &lt;p&gt;Simili modo et tu, si bona
2 quae habes forti cautela custodire non negligis,
3 &lt;pb n="0773B"/&gt;</p>
          <p>2In particular, we apply regular expressions to match on parenthesis formatted in the manner specified in
Listing 1 and check whether the book abbreviation is in a manually curated list. In the case of a positive match,
we then try to parse the chapter and verse numbers. Finally, the parsed reference is checked against the vulgate
to see whether it corresponds to a real verse.
4 circa tabernaculum tuum, et ea quae intra illud
5 sunt tentoria suspendis. Nihil enim omnino tibi proderit
6 bona in te spiritualia congregasse, nisi diligenti
7 ea et sollicita circumspectione custodias. Hinc
8 in sacra Scriptura legimus, quia &lt;i&gt;posuit Deus hominem
9 in paradiso, ut operaretur, et custodiret illum (Gen. II, 15)&lt;/i&gt;.
10 In paradiso quippe Deus hominem
11 ponit, quando delectabilem tibi spiritualium gratiarum
12 copiam gratuito largiens, in sancta et tranquilla
13 conscientia suaviter te pausare facit.&lt;/p&gt;</p>
          <p>Listing 1: Example of an XML source file snippet from Adam Scotus, corresponding to De
tripartito tabernaculo, showcasing a passage containing an annotation of a biblical
reference (Gen, II, 15.) in line number 9.</p>
          <p>
            The automatic extraction of manually coded references resulted in a dataset of 210,022
references, which facilitates large-scale computational analyses of biblical intertextuality. While
the OCR is not perfect, and the annotation cannot be deemed exhaustive, a manual inspection
of a representative sample indicates that the automatic procedure manages to parse editorial
annotations with high precision. More concretely, we isolated a sample of 100 instances which
showed a low alignment score according to the Smith-Waterman algorithm [
            <xref ref-type="bibr" rid="ref47">44</xref>
            ], and carefully
checked for the alleged reference.3 The set of references showing low alignment scores amounted
to 35.5% of all references. From the manually checked subset, 82% of all references could be
clearly found, 11% were unexpectedly located in the nearest context (due to OCR mistakes
pertaining to the recognition of digits), and 7% were missed. The analysis thus reveals that
about 2.45% (i.e. 7% out of 35.5% from the total) of all references are wrong, an amount that,
while not fully negligible, was yet deemed to be unproblematic.
          </p>
        </sec>
        <sec id="sec-2-2-2">
          <title>2.2.2. Preprocessing</title>
          <p>
            We apply the same preprocessing pipeline to the Patrology and the Vulgate. First, the text is
tokenized and POS-tagged using TreeTagger [
            <xref ref-type="bibr" rid="ref44">41</xref>
            ]. For lemmatization, we use a neural
networkbased lemmatizer trained with PIE [
            <xref ref-type="bibr" rid="ref31">28</xref>
            ] on a corpus of medieval Latin (Capitularia), that has
served as the basis to a number of Latin lemmatization studies [
            <xref ref-type="bibr" rid="ref14 ref23 ref5">14, 5, 22</xref>
            ]. As opposed to
TreeTagger’s lemmatizer, the neural-network based lemmatizer is able to analyze previously
unseen types and is able to disambiguate between possible alternative analyses, which, as
shown in the Appendix, results in more coherent topics.
          </p>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Sampling</title>
        <p>The analysis focuses on a subset of authors that are particularly prolific and thus provide a
fruitful test-bed for statistical analysis. From the entire Patrology, we sample authors who have
contributed a total of at least 100K tokens and from their writings we sample books with at
least 100 references to the Bible, making sure that at least two books per author are held out for
3After a first inspection of the distribution of scores, it could be observed that scores higher than 4
consistently yielded true references, therefore these were excluded from the sample to be manually checked. Note
that the exact number of this threshold is dependent on the algorithm parameters and cannot be interpreted
independently.
developing purposes. From this subset, we further remove commentaries, which, due to their
exegetical nature, refer to the Bible very copiously and in a less interesting manner from the
point of view of intertextuality research. (In total, commentaries amounted to 8 documents.)
The resulting subset (which amounts to 2,921,142 tokens or 2.7% of the collection) is further
processed to extract passages containing references to the Bible as described in Section 2.2.2.
In total, we collected 15,195 biblical references across 24 authors.</p>
        <p>The remaining documents of the Patrology are set apart and used for training a topic model
that will be used in order to automatically capture the theme in a given passage.</p>
      </sec>
      <sec id="sec-2-4">
        <title>2.4. Topic Modeling</title>
        <p>
          An LDA topic model [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] was trained on the lemmatized version of the remaining dataset,
comprising 103,687,454 tokens. The topic model was trained using the gensim package [
          <xref ref-type="bibr" rid="ref39">36</xref>
          ],
which provides an implementation of Online LDA [
          <xref ref-type="bibr" rid="ref20 ref50">20</xref>
          ]. We fit an LDA model after removal of
all words that were not strictly alphanumeric, any word that was not identified as an adjective,
adverb, noun or verb, and all words that appear in a specifically designed stopword list. 4. The
hyper-parameters of the LDA algorithm were further selected on the basis of a validation study
that used grid-search with the objective of maximizing topical coherence [
          <xref ref-type="bibr" rid="ref40">37</xref>
          ] on the held-out
dataset. The results of the validation study are reported in the Appendix. The resulting topic
model was fit on document snippets of 1,500 words, 200 topics and a vocabulary truncated to
the 20,000 most frequent words.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <p>In order to model the thematic embedding of intertextual references, we need to operationalize
a notion of similarity of both a purely lexical and a thematic type. While lexical similarity
can be easily approximated by means of set-based similarity metrics typically used in
textreuse applications, the operationalization of thematic similarity in terms of similarity between
the topic distributions inferred by a topic model requires certain preprocessing. Since topic
models are essentially modeling word co-occurrence patterns across documents, the presence
of an intertextual link will bias the respective inferred topic distributions in a common and
expected direction, especially if the intertextual link is based on high lexical overlap. In order
to disentangle topical from lexical similarity, the topic distributions are inferred on the original
document after removal of the lexical overlap with respect to the linked document. In such
case, a strong match in the respective inferred topic distributions can be interpreted as an
indication of a high thematic embedding in the context on the basis that the lexical choices
made in the context point at terms that co-occur in similar topics as the terms that establish
the link.</p>
      <sec id="sec-3-1">
        <title>3.1. Lexical Similarity</title>
        <p>
          In order to extract lexical similarity, we resort to traditional methods from the text re-use
literature – see, for instance [
          <xref ref-type="bibr" rid="ref45">42</xref>
          ]. We focus on the Jaccard coefficient, which is defined as
the number of shared words divided by the total number of words types in the documents.
        </p>
        <p>4This wordlist includes terms such as “dominus”, “deus”, etc. that were deemed irrelevant for composing
topic-term distributions due to their high frequency. The wordlist together with all relevant code will be
published upon publication.
In the present study, we compute the weighted version of the Jaccard coefficient, shown in
Equation 3.1, which gives a more accurate value by taking into account the frequency of the
words:</p>
        <p>J (Di, Dj ) =</p>
        <p>∑
w∈Di∪Dj
min[c(w, Di), c(w, Dj )]
max[c(w, Di), c(w, Dj )]
(1)
In Equation 3.1 c(w, Di) refers to the number of times word w appears in document Di. In
order to weight higher the influence of more literal borrowings, we represent the documents
not just at the level of words but include word bi-grams and tri-grams as well. Finally, we do
not consider the actual words but their lemmas and apply a stopword list.5</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Topical Similarity</title>
        <p>Given the inferred topic distributions of given source and target documents, we resorted to
information-theoretic measures relating to the distribution entropies to estimate the topic
similarity of the underlying documents. In particular, we use the Jensen-Shannon divergence,
shown in Equation 2:</p>
        <p>J SD(θDi , θDj ) = 21 DKL(θDi ||θDj ) + 21 DKL(θDj ||θDi ) (2)
which corresponds to the arithmetic mean between the Kullback-Leiber divergence of the
topic distribution of the ith document θDi with respect to the topic distribution of the jth
document thetaDj and the reverse. By taking the mean, J SD transforms the KL Divergence
into a symmetric measure. Since JSD is a divergence, we transform it into a similarity by
substracting it from one: 1 − J SD(Di, Dj ).</p>
        <p>In order to guarantee rich topic representations, we consider left and right contexts of a
given reference including a total of 500 words for the referencing documents, and the entire
chapter-level context for the biblical text.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Topical Context</title>
        <p>In order to approach RQ4, we need to capture the theme surrounding the particular intertexts.
In the present study, we utilize the topic-model from Section 2.4 to identify the most dominant
topic in the inferred topic distribution of a given passage. Thus, a given document Di is
assigned an index pointing to topic t with highest probability in the topic distribution inferred
for document Di:
argmax θDti</p>
        <p>t</p>
        <p>By taking such summary of the distribution we are certainly ignoring important
information about the composition of topics in the document—especially in high entropy topic
distributions—and also limit the subsequent modeling from exploiting correlations in the
distribution of topics across documents—since some topics will tend to co-occur with each other.
However, it simplifies the statistical modeling considerably, while still capturing a considerable
amount of topic information.</p>
        <p>5Note that this stopword list difers slightly from the one applied in the topic model pipeline, since the
nature of the task is diferent.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Data Analysis</title>
      <p>We approach the research questions by making use of multivariate multi-level intercept-only
model using lexical similarity (lex) from Section 3.1, and topic similarity (topic) from
Section 3.2 as outcomes. In order to analyze the efects of authorship and contextual theme as well
as any source collection-level efects on the type of intertext, we specified a series of multi-level
models including random intercepts for each of the levels in each of the grouping factors. The
number of levels per grouping factors amounted to the following: author (A: 24 levels), biblical
book (B: 52 levels) and dominant topic (T: 129 levels).6</p>
      <p>
        We conducted all analyses in R (3.6.3) [
        <xref ref-type="bibr" rid="ref22">21</xref>
        ] using the brms package [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] for model fitting.
We chose weakly informative priors as per the defaults in the brms package, unless otherwise
specified. 7 Throughout the experiment, model convergence was checked on the basis of R-hat
values, number of efective samples and trace plots.
      </p>
      <p>Since we were not particularly interested in the magnitude of the efects, we did not operate
on the outcome variables directly, but instead we applied a normalizing transformation to center
them around a zero-mean and a unit standard deviation. Such transformation also facilitates
model fitting and makes the interpretation of coefficients more interpretable, especially when
considering comparisons of variables in diferent scales.</p>
      <sec id="sec-4-1">
        <title>4.1. Model definition</title>
        <p>The general model including all grouping factors is defined by Equation 3. The statistical model
consists in a bi-variate model that includes no predictors8, and groups observations according
to three diferent criteria. Observations are modeled as coming from a bi-variate normal. The
means are decomposed into grand means, al and at, and group-specific deviations from the
mean alK , atK . Furthermore, the latter are modeled hierarchically as distributed themselves
6Note that this number corresponds to the actual number of dominant topics that appear in the dataset
and therefore diverge from the total number of topics fit. This situation arises since not all of the 200 estimated
topics are realized as dominant topic in the target dataset.</p>
        <p>7At the time of running the experiments, these priors were Student T distributions with 3 degrees of freedom,
location of 0, and a scale of 2,5.</p>
        <p>8Note, however, that here is nothing inherent to the research design that prevents from including predictors.
For instance, future work may want to improve the model by considering the influence of time, genre or the
density of references in the surrounding passage.
according to a second multivariate normal centered around zero. Finally, following Gelman
and Hill (2006, Chapter 13), covariances Σ and ΣK are decomposed into a diagonal matrix
of standard deviations that model lexical and topical variation individually and a correlation
matrix that additionally targets correlations between both response variables.
[µl]
µt
[yl]
yt
∼ MVNormal([µl] , Σ)</p>
        <p>µt
(σl
Σ = 0
= [aatl] + [aaltAA]
0 )
σt</p>
        <p>(σl
R 0
0 )
σt
+ [aaltBB]</p>
        <p>+ [aaltTT ]
ΣK =
[alK ]
atK
(σlK
0</p>
        <p>
          [0]
∼ MVNormal( 0
0 ) RK (σlK
σtK 0
, ΣK )
0 )
σtK
(3)
In Equation 3, yl and yt refer to the lexical and topical outcome variables, alK and atK refer
to the varying intercepts for the lexical and topical similarity for factor K. Finally, we set the
priors of all σ terms to student-t priors and the correlation term R to a flat LKJ prior [
          <xref ref-type="bibr" rid="ref29">26</xref>
          ].
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Model comparison</title>
        <p>We first analyze the importance of the diferent factors on the outcome distribution through
information criteria. As it is commonly done in Bayesian model comparison, we use the expected
log predictive density (ELPD) as test measure, which provides an estimate of the predictive
accuracy of a model on new datasets (out-of-sample data). Estimates of ELPD can be
efficiently obtained—i.e. without having to refit multiple models on the diferent data partitions—
through approximate leave-one-out (LOO). In particular, we use the Pareto-smoothed
importance sampling (PSIS-LOO) method—see Vehtari et al. (2017) for a description of the method
and Vehtari et al. (2018) for an implementation in the R programming language.</p>
        <p>Table 1 shows the results of the comparison. As we can see, the model utilizing all grouping
factors (MA∪B∪T ) is expected to have much better predictive performance than any of the
single-grouping models. For the individual factor models, we observe that theme-level grouping
has stronger explanatory power than author-level or book-level grouping, with the latter two
receiving ELPDs within error of each other.</p>
        <p>In order to better grasp the respective contribution of book-level and author-level groupings
to the model’s predictive performance, we fitted MA∪T and MB∪T and compared them to the
general MA∪B∪T . The results of the comparison are shown in the last two rows of Table 1. As
we can see, MB∪T produces much better estimates than MA∪T , which indicates that
grouping according to reference book produces a model with more explanatory power than when
grouping according to author.</p>
        <p>Finally, we can gain further insight into the modelling power of the diferent groupings by
inspecting estimates of explained variance. For generality, our estimates are computed by
subtracting a reference variance from the variance in the samples drawn from the posterior
20
10
ity 0
s
n
e
D20
10
0
20
10
0
~A + B + T
~A
~B
~T
(a) Explained variance estimates for diferent
grouping factors with respect to the
reference model with no grouping using model
MA∪B∪T . ∼ A+B +T refers to the model
including all random efects. ∼ K refers to
the model ignoring all random efects
except K.
predictive distribution of the general model (MA∪B∪T ) when considering diferent combinations
of groupings. The reference variance corresponds to the variance in samples drawn from the
posterior predictive when ignoring all groupings.</p>
        <p>Figure 1a shows the results decomposed into the two diferent outcome variables considering
all groupings (∼ A + B + T ) and the individual groupings (∼ A, ∼ B and ∼ T ). As we can see,
while both book-level and topic-level groupings have an approximately equal estimate of the
explained variance for the lexical and topical outcomes, author-level grouping seems to explain
a larger share than topic-level grouping. This result seems to suggest that lexical similarity
does a better job at discerning between referencing styles of authors. Still, since the author
grouping yielded the smallest out-of-sample predictive performance estimates, we can only
postulate a mild authorship signal.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Inspection of groupings</title>
        <p>Having inspected the relative contributions of the diferent grouping factors, we now consider
the posterior estimates of the outcome variables at diferent grouping factors. As discussed
in Section 1, our analysis of local intertextuality posits two material aspects to intertextual
links. Besides the degree of “literality” of an intertext, we would like to add its thematic
embedding in the context, which we operationalize following the discussion in Section 3, into
the analysis. By inspecting the statistical relationships between the posterior estimates of
both outcome variables across groupings, we aim to gain insight about how these two aspects
0.0</p>
        <p>0.0
Topical
of intertextuality complement each other.</p>
        <p>Author grouping The left plot in Figure 2 shows the mean posterior estimates for authors,
averaging over books and topics. Overall, we observe considerable correlation between topical
and lexical similarity. For reference, Figure 1b shows the posterior estimates of the correlation
across outcomes for each of the groupings.</p>
        <p>It is important to note that the observed correlation is exacerbated by the efect of
multilevel modeling shrinkage. As shown in the right plot in Figure 2, author estimates are pushed
towards the diagonal when considering book and topic groupings, with no author mean estimate
remaining within the upper-left quadrant.</p>
        <p>
          As a result of the correlation, both the upper-left and bottom-right sections of the plot
are considerably less populated. In combination with the analysis from Figure 1a, we can
interpret the high correlation in the sense that the lexical similarity axis suffices to explain
the variation observed between authors.9 However, it is nevertheless interesting to investigate
the relative position of outliers. For instance, Petrus Cellensis (P-C), an author known for
his allegorical style [
          <xref ref-type="bibr" rid="ref38">35</xref>
          ], appears in the bottom-right section indicating a more allusive style
in which references are more than average thematically embedded. Bernardus Claraevallensis
(B-C), known for his constant biblical allusions [
          <xref ref-type="bibr" rid="ref33">30</xref>
          ], similarly appears to the right of Petrus
Cellensis. Finally, Augustinus Hipponensis (A-H) and Guibertus Mariae de Novigento
(G-SM-d-N) represent the extremes at the sections respectively to the upper-right, characterized by
a highly embedded style, and to the bottom-left, leaning towards loosely connected references.
        </p>
        <p>9As a reminder from Section 3, the estimates of topic-level similarity were computed on documents after
removing the lexical overlap to avoid biases from lexical similarity.
0.0</p>
        <p>0.0
Topical</p>
        <p>0.0</p>
        <p>Topical
2.5
Book grouping The left plot in Figure 3 shows the mean posterior estimates for books.</p>
        <p>We now observed a less correlated distribution, with a clear pattern emerging from the
partition of the Bible into the Old and the New Testament. In general terms, biblical intertext
linking to the New Testament tends towards a more quotational style. On the topical axis,
the trend is less clear with a mild association of the New Testament with higher thematic
embedding.10 Again, inspecting the outliers can help the interpretation of the distribution. In
the top of the plot we find the Deuteronomy, a biblical book that contains a large body of laws,
blessing and courses, all of which is more likely to be quoted than alluded to. In contrast, in
the more allusive quadrant of the plane—i.e. the bottom-right, we find the Song of Songs, a
book that largely consists of love poems and a strongly allegorical style.</p>
        <p>Topic grouping Finally, we inspect the estimates for the topic-level grouping. Given the
large number of topics and the fact that, despite our eforts to optimize the topic coherence
of the topic-term distributions, topic-modeling algorithms do not provide guarantees about
the interpretability of the inferred topics, care should be taken when attempting to draw
conclusions from the posterior mean distribution.</p>
        <p>The right plot in Figure 3 displays the mean posterior estimates for topics. Similarly to
the distribution of posterior means for authors, the distribution of topics shows an important
degree of correlation. However, in this case there is considerable dispersion in the upper-right
section. While a thorough exploration of the topics is beyond the scope of the present paper,
we have singled out a number of topics for commentary. For illustration, the selected topics
have been highlighted in Figure 3 and the corresponding topic descriptors (top probality terms
10After observing such a pattern, we fitted an additional model nesting the book levels into their
corresponding Testament. The resulting model, however, did not yield any considerable improvements in the LOO
estimates with respect to model MA∪B∪T and was therefore not further considered in the analyses.
under the given topic) are shown in List 1.</p>
        <p>• Topic 11 “propheta” (prophet), “Isaiah”, “apostolus” (apostle), “Matthaeus” (Matthew),
“scriptura” (Bible)
• Topic 30 “anima” (soul), “ratio” (reason), “cogito” (to conceive), “sensus”
• Topic 36 “fides” (faith), “veritas” (truth), “pax” (peace), “credo” (to believe)
• Topic 66 “sara” (Sarah), “ancilla” (slave), “Abraham”, “angelus” (angel)
• Topic 76 “voluntas” (will), “neccesitas” (inevitableness) , “liber” (free), “arbitrium”
(judgement)</p>
        <p>List 1.: Topic descriptors for a selection of illustrative topics</p>
        <p>Topics 30, 36 and 76, which are located on the rather allusive quadrant of the panel, all
seem to refer to moral and philosophical terms as well as to concepts relating to the human
psyche. Topic 11, which points to a topic that triggers intertexts predominantly characterized
by high lexical overlap, seems to relate to writings of and about prophets, apostles, etc. Such
trend could indicate that references to authoritative figures are more likely to appear regardless
the thematic context. Finally, Topic 66 located towards the upper-right extreme corner, thus
indicating both high lexical and topical similarity, groups terms related to events that regard
an important Biblical figure: Abraham.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion</title>
      <p>After having carried out the analyses, we now proceed to address how the statistical evidence
helps approaching the research questions advanced in Section 1.</p>
      <p>With respect to RQ1, we explore to what extent the decomposition of the intertext type into
the aspects of lexical similarity and thematic embedding proved helpful for characterizing the
observed variation across the diferent grouping factors. Apriori, the intersection of both axes
should produce four intertextual trends depending on whether lexical and topical similarity
are below or above mean. These trends correspond to the four quadrants shown in Figure 2
and Figure 3. However, our analyses generally showed a correlation between both aspects,
which resulted in low-density bottom-right and, especially, upper-left quadrants. As a result,
we can conclude that overall allusive cases of intertextuality do not rely on proportionally
higher degrees of topical embedding to reinforce the intertextual link. Complementarily, the
presence of high lexical similarity seems to generally trigger high topical embedding, even when
controlling for lexical overlap during the estimation of topical similarity. However, despite the
mentioned correlation, we can conclude that the inclusion of both axes provides a fuller picture
of local intertextuality since (i) correlation varied depending on the grouping factor and (ii) the
position of outliers with respect to the general trend highlights the particularities of particular
authors, books or topics that would be otherwise missed.</p>
      <p>
        With respect to RQ2, we found mild evidence of authorial signal in the type of intertext that
authors place when referring to the Bible. This signal was especially pronounced on the lexical
similarity axis. This result is broadly congruent with the state of the art in computational
authorship identification: depending on the topical diversity of a corpus, semantic features in
isolation rarely outperform more straightforwardly engineered surface features, such as word
choice [
        <xref ref-type="bibr" rid="ref42">39</xref>
        ]. With respect to the topical embedding of intertexts, author variation was less
important due to high correlation with lexical similarity that was unveiled by the shrinkage
induced through partial pooling. However, the outlier status of some authors with respect to
the general trend could still be interpreted in a stylistic way (e.g. the discussed cases of Petrus
Cellensis and Bernardus Claraevallensis).
      </p>
      <p>With respect to RQ3, we observed a stable efect of the target collection, specifically the
biblical book from which the reference originated. Model comparison showed that this efect
plays a bigger role than authorial preferences in the distribution of the outcome variables. The
distinction between Old and New Testament was highly relevant since it uncovered a pattern
according to which New Testament books tend to elicit higher lexical similarity. Though this
ifnding is probably not translatable to other contexts in which no single source plays such a
dominant role so as to exert authoritative pressure on the type of intertext, it nevertheless
highlights the importance of considering not just the borrowing and borrowed text but also
structural aspects of the source collection when studying co-variates of intertextual links.</p>
      <p>Finally, with respect to RQ4, the statistically most important grouping factor turned out
to be the dominant topic in the borrowing passage. In this case, the correlation between
lexical and topical similarity was estimated to be highest, though considerable dispersion was
observed in the upper-right quadrant. Manual inspection of topics with posterior means located
to significant locations illustrated that their positioning could be made sense of on the basis
of the topic descriptors, even though any general theorizing on the efect of topical trends on
the type of intertext must be left for future work.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Future Work</title>
      <p>In the present paper, we have conducted a systematic analysis of relevant factors of variation of
intertextual types from a quantitative and data-driven perspective. An implicit assumption of
our study, which technically underlies all computational approaches to intertextuality, is that
local intertextual links depend on an explicit textual form that can be more or less rigorously
identified. While in this study we exploited an already annotated collection of references,
replicating our analysis on other collections depends on the automatic extraction of intertextual
links. However, such analysis would require the application of text-reuse detection algorithms
that yield both high precision and recall for allusive cases. In order to expand the scope of
quantitative intertextuality research, future eforts should, thus, aim not just at improving
the task of intertextual retrieval, but also systematically evaluating the precision and recall
that can be expectedly obtained. Moreover, since the efect of topic-level grouping turned out
to be highly explanatory of the distribution of intertextual links, we hypothesize that such
contextual interactions may turn out to be relevant for intertext retrieval applications, which
can test how to incorporate them into their retrieval models.</p>
      <p>Finally, our work relied on LDA-based topic models and therefore on topics that are not
guaranteed to be interpretable. The acknowledgement of this limitation led us to refrain from
an exhaustive qualitative exploration of intertext type distributional patterns at the
topiclevel. In the present paper, we provided only fragmentary evidence of such topic-intertext
relations: e.g. that the posterior means for lexical similarity and thematic embedding under
topics related to moral and philosophical terms are low. However, we believe that future work
should investigate systematic ways in which researchers can systematically explore such topic
spaces in order to elicit potentially fruitful hypotheses.</p>
    </sec>
    <sec id="sec-7">
      <title>A. Author Information</title>
      <p>Table 2 displays a list of the authors included in the present study together with the total
number of references in the dataset and the initials used in Figure 2 to identify the author.</p>
    </sec>
    <sec id="sec-8">
      <title>B. Topic Modeling</title>
      <p>N
u
m
T
o
p
i
c
s
=
2
0
0
Model
TT
PIE
10k
5k
10k
Top-K
20k
5k
10k
20k</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Acerbi</surname>
          </string-name>
          and
          <string-name>
            <given-names>R. A.</given-names>
            <surname>Bentley</surname>
          </string-name>
          . “
          <article-title>Biases in cultural transmission shape the turnover of popular traits”</article-title>
          .
          <source>In: Evolution and Human Behavior 35.3</source>
          (
          <issue>2014</issue>
          ), pp.
          <fpage>228</fpage>
          -
          <lpage>236</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>G.</given-names>
            <surname>Allen. Intertextuality</surname>
          </string-name>
          . Routledge, Mar.
          <year>2000</year>
          . isbn:
          <volume>9780203131039</volume>
          . doi:
          <volume>10</volume>
          .4324/978 0203131039. url: https://www.taylorfrancis.com/books/9780203131039.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>D.</given-names>
            <surname>Bamman</surname>
          </string-name>
          and
          <string-name>
            <surname>G. Crane. “</surname>
          </string-name>
          <article-title>The logic and discovery of textual allusion”</article-title>
          .
          <source>In: Proceedings of the Second Workshop on Language Technology for Cultural Heritage Data</source>
          .
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>D. M.</given-names>
            <surname>Blei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. Y.</given-names>
            <surname>Ng</surname>
          </string-name>
          , and
          <string-name>
            <surname>M. I. Jordan.</surname>
          </string-name>
          “
          <article-title>Latent Dirichlet allocation”</article-title>
          .
          <source>In: Journal of Machine Learning Research</source>
          (
          <year>2003</year>
          ).
          <source>issn: 15324435. doi: 10.1016/b978-0-12-411519-4</source>
          .0
          <fpage>0006</fpage>
          -
          <lpage>9</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>T.</surname>
          </string-name>
          <article-title>vor der Brück, S. Eger, and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Mehler</surname>
          </string-name>
          . “
          <article-title>Lexicon-assisted tagging and lemmatization in Latin: A comparison of six taggers and two lemmatization models”</article-title>
          .
          <source>In: Proceedings of the 9th SIGHUM Workshop on Language Technology for Cultural Heritage</source>
          ,
          <source>Social Sciences, and Humanities (LaTeCH)</source>
          . Stroudsburg, PA, USA: Association for Computational Linguistics,
          <year>2015</year>
          , pp.
          <fpage>105</fpage>
          -
          <lpage>113</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>W15</fpage>
          -3716. url: http://aclweb .org/anthology/W15-3716.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Büchler</surname>
          </string-name>
          et al. “
          <article-title>Towards a Historical Text Re-use Detection”</article-title>
          .
          <source>In: Text Mining: From Ontology Learning to Automated Text Processing Applications</source>
          . Ed. by
          <string-name>
            <given-names>C.</given-names>
            <surname>Biemann</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Mehler</surname>
          </string-name>
          . Cham: Springer International Publishing,
          <year>2014</year>
          , pp.
          <fpage>221</fpage>
          -
          <lpage>238</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>P. C.</given-names>
            <surname>Bürkner</surname>
          </string-name>
          . “
          <article-title>Advanced Bayesian multilevel modeling with the R package brms”</article-title>
          .
          <source>In: R Journal</source>
          (
          <year>2018</year>
          ). issn:
          <volume>20734859</volume>
          . doi:
          <volume>10</volume>
          .32614/rj-2018-017. eprint:
          <volume>1705</volume>
          .
          <fpage>11123</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>N.</given-names>
            <surname>Cofee</surname>
          </string-name>
          et al. “
          <article-title>The Tesserae Project: intertextual analysis of Latin poetry”</article-title>
          .
          <source>In: Literary and Linguistic Computing 28.2 (July</source>
          <year>2012</year>
          ), pp.
          <fpage>221</fpage>
          -
          <lpage>228</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>G. B.</given-names>
            <surname>Conte</surname>
          </string-name>
          . “
          <article-title>The Rhetoric of Imitation: Genre and Poetic Memory in Virgil and Other Latin Poets”</article-title>
          . In: The Classical World (
          <year>1988</year>
          ). issn:
          <volume>00098418</volume>
          . doi:
          <volume>10</volume>
          .2307/4350270.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>G. Crane.</surname>
          </string-name>
          “
          <article-title>Building a digital library: The Perseus Project as a case study in the humanities”</article-title>
          .
          <source>In: Proceedings of the first ACM international conference on Digital libraries . 1996</source>
          , pp.
          <fpage>3</fpage>
          -
          <lpage>10</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>E. R.</given-names>
            <surname>Crema</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kandler</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S. J.</given-names>
            <surname>Shennan</surname>
          </string-name>
          . “
          <article-title>Revealing patterns of cultural transmission from frequency data: equilibrium and non-equilibrium assumptions”</article-title>
          . In: Nature Publishing Group (Dec.
          <year>2016</year>
          ), pp.
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>J.</given-names>
            <surname>Culler</surname>
          </string-name>
          . “
          <article-title>Presupposition and Intertextuality”</article-title>
          .
          <source>In: MLN 91.6</source>
          (
          <issue>1976</issue>
          ), pp.
          <fpage>1380</fpage>
          -
          <lpage>1396</lpage>
          . issn:
          <volume>00267910</volume>
          , 10806598. url: http://www.jstor.org/stable/2907142.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>S.</given-names>
            <surname>Deerwester</surname>
          </string-name>
          et al. “
          <article-title>Indexing by latent semantic analysis”</article-title>
          .
          <source>In: Journal of the American Society for Information Science</source>
          (
          <year>1990</year>
          ). issn:
          <volume>10974571</volume>
          . doi:
          <volume>10</volume>
          .1002/(SICI)
          <fpage>1097</fpage>
          -
          <lpage>4571</lpage>
          (
          <issue>199009</issue>
          )41:
          <fpage>6</fpage>
          &lt;
          <fpage>391</fpage>
          :
          <article-title>:AID-ASI1&gt;3.0</article-title>
          .CO;
          <fpage>2</fpage>
          -
          <lpage>9</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>S.</given-names>
            <surname>Eger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Gleim</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Mehler</surname>
          </string-name>
          . “
          <article-title>Lemmatization and Morphological Tagging in German and Latin: A Comparison and a Survey of the State-of-the-art”</article-title>
          .
          <source>In: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)</source>
          .
          <year>2016</year>
          , pp.
          <fpage>1507</fpage>
          -
          <lpage>1513</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>J.</given-names>
            <surname>Farrell</surname>
          </string-name>
          . “
          <article-title>Intention and intertext”</article-title>
          . In: Phoenix (
          <year>2005</year>
          ). issn:
          <volume>00318299</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>C. W.</given-names>
            <surname>Forstall</surname>
          </string-name>
          and
          <string-name>
            <given-names>W. J. Scheirer. Quantitative</given-names>
            <surname>Intertextuality</surname>
          </string-name>
          .
          <source>Analyzing the Markers of Information Reuse</source>
          . Springer,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>A.</given-names>
            <surname>Gelman</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Hill</surname>
          </string-name>
          .
          <article-title>Data Analysis Using Regression and Multilevel/Hierarchical Models</article-title>
          . Cambridge: Cambridge University Press,
          <year>2006</year>
          . isbn:
          <volume>9780511790942</volume>
          . doi:
          <volume>10</volume>
          .1017 /CBO9780511790942. url: http://ebooks.cambridge.org/ref/id/CBO9780511790942.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>G.</given-names>
            <surname>Genette</surname>
          </string-name>
          . Palimpsestes:
          <article-title>La littérature au second degré</article-title>
          .
          <source>Seuil</source>
          ,
          <year>1982</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>I. C.</surname>
          </string-name>
          <article-title>Ghiban and Ş</article-title>
          . Trǎuşan-Matu.
          <article-title>“Network Based Analysis of Intertextual Relations”</article-title>
          .
          <source>In: Advances in Information Systems and Technologies. Ed. by Á</source>
          . Rocha et al. Berlin, Heidelberg: Springer Berlin Heidelberg,
          <year>2013</year>
          , pp.
          <fpage>753</fpage>
          -
          <lpage>762</lpage>
          . isbn:
          <fpage>978</fpage>
          -3-
          <fpage>642</fpage>
          -36981-0.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>M.</given-names>
            <surname>Hofman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. R.</given-names>
            <surname>Bach</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D. M.</given-names>
            <surname>Blei</surname>
          </string-name>
          . “
          <article-title>Online learning for latent dirichlet allocation”.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <source>In: advances in neural information processing systems</source>
          .
          <source>2010</source>
          , pp.
          <fpage>856</fpage>
          -
          <lpage>864</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>R.</given-names>
            <surname>Ihaka</surname>
          </string-name>
          and
          <string-name>
            <given-names>R.</given-names>
            <surname>Gentleman</surname>
          </string-name>
          . “R:
          <article-title>A Language for Data Analysis and Graphics”</article-title>
          .
          <source>In: Journal of Computational and Graphical Statistics</source>
          <volume>5</volume>
          .3 (
          <issue>Sept</issue>
          .
          <year>1996</year>
          ), pp.
          <fpage>299</fpage>
          -
          <lpage>314</lpage>
          . issn:
          <fpage>1061</fpage>
          -
          <lpage>8600</lpage>
          . doi:
          <volume>10</volume>
          .1080/10618600.
          <year>1996</year>
          .
          <volume>10474713</volume>
          . url: http://www.tandfonline.com/doi/abs/10.1 080/10618600.
          <year>1996</year>
          .
          <volume>10474713</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [22] [23]
          <string-name>
            <given-names>M.</given-names>
            <surname>Kestemont</surname>
          </string-name>
          and
          <string-name>
            <given-names>J. D.</given-names>
            <surname>Gussem</surname>
          </string-name>
          . “
          <article-title>Integrated Sequence Tagging for Medieval Latin Using Deep Representation Learning”</article-title>
          .
          <source>In: J. Data Min. Digit. Humanit</source>
          .
          <year>2017</year>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>url: https://jdmdh.episciences.org/3835.</mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          <string-name>
            <given-names>G. N.</given-names>
            <surname>Knauer</surname>
          </string-name>
          . “
          <article-title>Die Aeneis und Homer. Studien zur poetischen Technik Vergils mit Listen der Homerzitate in der Aeneis”</article-title>
          . In: The Classical World (
          <year>1965</year>
          ). issn:
          <volume>00098418</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          <source>doi: 10</source>
          .2307/4345826.
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>J.</given-names>
            <surname>Kristeva</surname>
          </string-name>
          . “
          <article-title>Bakhtine, le mot, le dialogue et le roman”</article-title>
          . In: Critique. (
          <year>1967</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>Y. D. H.</given-names>
            <surname>Lab</surname>
          </string-name>
          . Intertext. https://github.com/YaleDHLab/intertext.
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>D.</given-names>
            <surname>Lewandowski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kurowicka</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Joe</surname>
          </string-name>
          . “
          <article-title>Generating random correlation matrices based on vines and extended onion method”</article-title>
          .
          <source>In: Journal of Multivariate Analysis</source>
          (
          <year>2009</year>
          ).
          <source>issn: 0047259X. doi: 10</source>
          .1016/j.jmva.
          <year>2009</year>
          .
          <volume>04</volume>
          .008.
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>J.</given-names>
            <surname>Lund</surname>
          </string-name>
          et al. “
          <article-title>Cross-referencing Using Fine-grained Topic Modeling”</article-title>
          .
          <source>In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (Long and Short Papers). Minneapolis, Minnesota: Association for Computational Linguistics,
          <year>June 2019</year>
          , pp.
          <fpage>3978</fpage>
          -
          <lpage>3987</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>N19</fpage>
          -1399. url: https://www.aclweb.org/anthology/N19-1399.
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>E.</given-names>
            <surname>Manjavacas</surname>
          </string-name>
          , Á. Kádár, and
          <string-name>
            <given-names>M.</given-names>
            <surname>Kestemont</surname>
          </string-name>
          . “
          <article-title>Improving Lemmatization of Non-Standard Languages with Joint Learning”</article-title>
          .
          <source>In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (Long and Short Papers). Minneapolis, Minnesota: Association for Computational Linguistics,
          <year>June 2019</year>
          , pp.
          <fpage>1493</fpage>
          -
          <lpage>1503</lpage>
          . url: https://www.aclweb.or g/anthology/N19-1153.
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>E.</given-names>
            <surname>Manjavacas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Long</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Kestemont</surname>
          </string-name>
          . “
          <article-title>On the Feasibility of Automated Detection of Allusive Text Reuse”</article-title>
          .
          <source>In: Proceedings of the 3rd Joint {SIGHUM} Workshop on Computational Linguistics for Cultural Heritage</source>
          ,
          <source>Social Sciences, Humanities and Literature</source>
          . Minneapolis, USA: Association for Computational Linguistics,
          <year>June 2019</year>
          , pp.
          <fpage>104</fpage>
          -
          <lpage>114</lpage>
          . url: https://www.aclweb.org/anthology/W19-2514.
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>B. P.</given-names>
            <surname>Mcguire</surname>
          </string-name>
          . “
          <article-title>Bernard of Clairvaux”</article-title>
          . In:
          <article-title>A Companion to Philosophy in the Middle Ages</article-title>
          . Oxford, UK: Blackwell Publishing Ltd, Nov.
          <year>2007</year>
          , pp.
          <fpage>209</fpage>
          -
          <lpage>214</lpage>
          . isbn:
          <volume>9780470996669</volume>
          . doi:
          <volume>10</volume>
          .1002/9780470996669.ch28. url: http://doi.wiley.
          <source>com/10</source>
          .1002/9780470996669.c h28.
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>A.</given-names>
            <surname>Mesoudi</surname>
          </string-name>
          .
          <article-title>Cultural evolution: How Darwinian theory can explain human culture and synthesize the social sciences</article-title>
          . University of Chicago Press,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>J. P.</given-names>
            <surname>Migne</surname>
          </string-name>
          .
          <source>Patrologiae Cursus Completus. Series Latina</source>
          (
          <volume>217</volume>
          +
          <article-title>4 vols</article-title>
          .) Garnier frères,
          <fpage>1844</fpage>
          -
          <lpage>1855</lpage>
          (
          <article-title>and 1862-</article-title>
          <year>1865</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>S.</given-names>
            <surname>Moyise</surname>
          </string-name>
          . “
          <article-title>Intertextuality and Biblical Studies: A Review”</article-title>
          .
          <source>In: Verbum et ecclesia 23.2</source>
          (
          <issue>2002</issue>
          ), pp.
          <fpage>418</fpage>
          -
          <lpage>431</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>M.</given-names>
            <surname>Orr</surname>
          </string-name>
          . “
          <article-title>Intertextuality”</article-title>
          .
          <source>In: The Encyclopedia of Literary and Cultural Theory</source>
          . Oxford, UK: John Wiley &amp; Sons, Ltd, Dec.
          <year>2010</year>
          . doi:
          <volume>10</volume>
          . 1002 / 9781444337839. wbelctv2i002. url: http://doi.wiley.
          <source>com/10</source>
          .1002/9781444337839.wbelctv2i002.
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>M.</given-names>
            <surname>Ott</surname>
          </string-name>
          . “
          <article-title>Peter Cellensis”</article-title>
          .
          <source>In: The Catholic Encyclopedia. Robert Appleton Company</source>
          ,
          <year>1911</year>
          , Vol.
          <volume>11</volume>
          . url: http://www.newadvent.org/cathen/11762b.htm.
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>R.</given-names>
            <surname>Rehurek</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Sojka</surname>
          </string-name>
          . “
          <article-title>Software Framework for Topic Modelling with Large Corpora”</article-title>
          .
          <source>In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks</source>
          (
          <year>2010</year>
          ). issn:
          <volume>2951740867</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          [37]
          <string-name>
            <given-names>M.</given-names>
            <surname>Röder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Both</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Hinneburg</surname>
          </string-name>
          . “
          <article-title>Exploring the Space of Topic Coherence Measures”</article-title>
          .
          <source>In: Proceedings of the Eighth ACM International Conference on Web Search and Data Mining - WSDM '15</source>
          . New York, New York, USA: ACM Press,
          <year>2015</year>
          , pp.
          <fpage>399</fpage>
          -
          <lpage>408</lpage>
          . isbn:
          <volume>9781450333177</volume>
          . doi:
          <volume>10</volume>
          .1145/2684822.2685324. url: http://dl.acm.org/citation.cf m?
          <source>doid=2684822</source>
          .
          <fpage>2685324</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          [38]
          <string-name>
            <given-names>P.</given-names>
            <surname>Roelli</surname>
          </string-name>
          . “
          <article-title>The corpus corporum, a new open Latin text repository and tool”</article-title>
          . In: Bulletin du Cange - Archivum
          <string-name>
            <surname>Latinitatis Medii Aevi</surname>
          </string-name>
          (
          <year>2014</year>
          ). issn:
          <volume>09948090</volume>
          . doi:
          <volume>10</volume>
          .5167/uzh-1
          <fpage>71105</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          [39]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Sari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Stevenson</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Vlachos</surname>
          </string-name>
          . “
          <article-title>Topic or Style? Exploring the Most Useful Features for Authorship Attribution”</article-title>
          .
          <source>In: Proceedings of the 27th International Conference on Computational Linguistics</source>
          ,
          <string-name>
            <surname>COLING</surname>
          </string-name>
          <year>2018</year>
          ,
          <string-name>
            <given-names>Santa</given-names>
            <surname>Fe</surname>
          </string-name>
          , New Mexico, USA,
          <year>August</year>
          20-
          <issue>26</issue>
          ,
          <year>2018</year>
          . Ed. by
          <string-name>
            <given-names>E. M.</given-names>
            <surname>Bender</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Derczynski</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Isabelle</surname>
          </string-name>
          . Association for Computational Linguistics,
          <year>2018</year>
          , pp.
          <fpage>343</fpage>
          -
          <lpage>353</lpage>
          . url: https://www.aclweb.org/anthology/C18-1029/.
        </mixed-citation>
      </ref>
      <ref id="ref43">
        <mixed-citation>
          [40]
          <string-name>
            <given-names>W.</given-names>
            <surname>Scheirer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Forstall</surname>
          </string-name>
          , and
          <string-name>
            <given-names>N.</given-names>
            <surname>Cofee</surname>
          </string-name>
          . “
          <article-title>The sense of a connection: Automatic tracing of intertextuality by meaning”</article-title>
          .
          <source>In: Digital Scholarship in the Humanities</source>
          (
          <year>2016</year>
          ).
          <source>issn: 2055768X. doi: 10</source>
          .1093/llc/fqu058.
        </mixed-citation>
      </ref>
      <ref id="ref44">
        <mixed-citation>
          [41]
          <string-name>
            <given-names>H.</given-names>
            <surname>Schmid</surname>
          </string-name>
          . “
          <article-title>Probabilistic part-ofispeech tagging using decision trees”</article-title>
          .
          <source>In: New methods in language processing</source>
          .
          <year>2013</year>
          , p.
          <fpage>154</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref45">
        <mixed-citation>
          [42]
          <string-name>
            <given-names>J.</given-names>
            <surname>Seo</surname>
          </string-name>
          and
          <string-name>
            <given-names>W. B.</given-names>
            <surname>Croft</surname>
          </string-name>
          . “
          <article-title>Local text reuse detection”</article-title>
          .
          <source>In: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '08</source>
          . New York, New York, USA: ACM Press,
          <year>2008</year>
          , p.
          <fpage>571</fpage>
          . isbn:
          <volume>9781605581644</volume>
          . doi:
          <volume>10</volume>
          .1145/1390334.1390432. url: http://portal.acm.org/citation.cf m?
          <source>doid=1390334</source>
          .
          <fpage>1390432</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref46">
        <mixed-citation>
          [43]
          <string-name>
            <given-names>D. A.</given-names>
            <surname>Smith</surname>
          </string-name>
          et al. “
          <article-title>Detecting and modeling local text reuse”</article-title>
          .
          <source>In: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries</source>
          .
          <year>2014</year>
          , pp.
          <fpage>183</fpage>
          -
          <lpage>192</lpage>
          . isbn:
          <volume>9781479955695</volume>
          . doi:
          <volume>10</volume>
          .1109/JCDL.
          <year>2014</year>
          .
          <volume>6970166</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref47">
        <mixed-citation>
          [44]
          <string-name>
            <given-names>T. F.</given-names>
            <surname>Smith</surname>
          </string-name>
          and
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Waterman</surname>
          </string-name>
          . “
          <article-title>Identification of common molecular subsequences”</article-title>
          .
          <source>In: Journal of Molecular Biology</source>
          (
          <year>1981</year>
          ). issn:
          <volume>00222836</volume>
          . doi:
          <volume>10</volume>
          .1016/
          <fpage>0022</fpage>
          -
          <lpage>2836</lpage>
          (
          <issue>81</issue>
          )90
          <fpage>087</fpage>
          -
          <lpage>5</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref48">
        <mixed-citation>
          [45]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vehtari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gelman</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Gabry</surname>
          </string-name>
          .
          <article-title>“loo: Efficient leave-one-out cross-validation and WAIC for Bayesian models”</article-title>
          .
          <source>In: R package version 2</source>
          .0 (
          <issue>2018</issue>
          ), p.
          <fpage>1003</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref49">
        <mixed-citation>
          [46]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vehtari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gelman</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Gabry</surname>
          </string-name>
          . “
          <article-title>Practical Bayesian model evaluation using leaveone-out cross-validation and WAIC”</article-title>
          . In: Statistics and Computing (
          <year>2017</year>
          ).
          <source>issn: 15731375. doi: 10.1007/s11222-016-9696-4</source>
          . eprint:
          <volume>1507</volume>
          .
          <fpage>04544</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref50">
        <mixed-citation>20k</mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>