<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Workshop on Computational Humanities Research, November</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Estimating the Loss of Medieval Literature with an Unseen Species Model from Ecodiversity</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mike Kestemont</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Folgert Karsdorp</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Literature, University of Antwerp</institution>
          ,
          <addr-line>Antwerp</addr-line>
          ,
          <country country="BE">Belgium</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Royal Netherlands Academy of Arts and Sciences, Meertens Institute</institution>
          ,
          <addr-line>Amsterdam</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <volume>1</volume>
      <issue>4</issue>
      <fpage>8</fpage>
      <lpage>20</lpage>
      <abstract>
        <p>The century-long loss of documents is one of the major impediments to the study of historic literature. Here we focus on Middle Dutch chivalric epics (ca. 1200-1450), a genre for which little archival records exist that shed light on the survival rates of works and documents. We cast the quantitative estimation of these survival rates as a variant of the unseen species problem from ecodiversity. We apply an established non-parametric method (Chao1) and compare it to a number of common alternatives on simulated data. Finally, we discuss the implications of our results for conventional philology: our numbers suggest that the losses sustained on the level of works may be more dramatic than previously imagined, whereas those at the document-level align surprisingly well with existing estimates in book history, although these were based on completely diferent data sources.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;medieval literature</kwd>
        <kwd>book history</kwd>
        <kwd>unknown species problem</kwd>
        <kwd>Middle Dutch</kwd>
        <kwd>ecodiversity</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        attested in some version [
        <xref ref-type="bibr" rid="ref35">36</xref>
        ]. Throughout the Middle Ages (ca. 500-1500 ad), handwritten
media, such as manuscripts or scrolls, were the primary physical medium for the sustainable
exchange of literary texts [
        <xref ref-type="bibr" rid="ref2 ref27">28, 2</xref>
        ]. Before the advent of printing, all witnesses of a text were
hand-copied from pre-existing exemplars, a practice that yielded textual traditions in which
intricate interdependencies exist between copies. The document tree resulting from this
process is known as the stemma codicum and such trees are nowadays studied in the field of
phylogenetics [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        Medieval text traditions, however, rarely survive in full, as many documents have now been
lost, due to a variety of historical reasons, including natural or infrastructural disasters (e.g.
library fires, such as the famous example of Alexandria) but also the wilful destruction by
humans, such as the controlled disposition of duplicates by heritage institutions or collectors
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Moreover, many sources have only survived fragmentarily and often the severely damaged
remnants of the same book are nowadays even scattered across various locations.1 This is
related to the fact that, in the premodern period, book binders regularly recycled parchment
codices into “maculature” that was used, for instance, to strengthen the spines of newer books,
which eventually ended up in diferent locales [
        <xref ref-type="bibr" rid="ref18">19</xref>
        ].
      </p>
      <p>
        We can assume that a large fraction of premodern documents, if not an absolute majority, is
nowadays unknown to us [
        <xref ref-type="bibr" rid="ref31">32</xref>
        ], either because the documents no longer exist, or because they
have not been recovered yet (e.g., due to cataloguing initiatives that are lagging behind) [
        <xref ref-type="bibr" rid="ref21">22</xref>
        ].
Consequently, a great deal of works are also unknown to us, in the obvious case where all the
documents representing a work are currently unknown [
        <xref ref-type="bibr" rid="ref23">40, 24</xref>
        ]. These assumptions are not only
justified by the many references in historic sources to works that we no longer know, but also
by the constant stream of new material findings nowadays – which has clearly been intensified
in recent decades by the emergence of the internet and social media [
        <xref ref-type="bibr" rid="ref20">21</xref>
        ]. Understanding the
literary preferences of the past, and explaining historical shifts therein, is one of the core tasks
of cultural studies and a prerequisite for producing valid literary histories. Nevertheless, it is
clear that the situation of partial observability, outlined above, severely compromises this task:
the available data only constitute a very limited sample of an original population of literature
that was much larger and more diverse. In statistical terms, our present-day perspective is
by necessity biased towards the materials that actually survived. Understandably, scholars
invariably agree that it is vital to correct these biased preconceptions and account for the
materials which are are no longer known to us [
        <xref ref-type="bibr" rid="ref2 ref23">40, 24, 2</xref>
        ].
      </p>
      <p>
        Methodologically, it is important to separate the loss of documents, from the loss of works
which it entails. As to the second matter, the loss of works, there has been very little empirical
work in the field of medieval studies, beyond the descriptive analysis of historic references
to lost works [
        <xref ref-type="bibr" rid="ref23">40, 24</xref>
        ]. There has been some empirical research into the first matter. Book
historical studies (such as [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]) have mainly studied the survival rate of documents on the basis
of the limited set of medieval collections, of which the composition is exactly known at specific
points in time. This allows one to quantify the gradual, diachronic loss of documents from these
collections. While these estimates are currently among the best we have, it is clear that it can
be hard to extrapolate these numbers to other regions, languages or collection environments
(e.g. monastic vs. lay book possession), so that alternative approaches to complement this
methodology would be a valuable addition to the field. Finally, it is worth mentioning the
1A dramatic example is the Beauvais Missal, of which the dismembered folios are currently being pieced
together again in a virtual reconstruction. Updates on the Broken Books project by Lisa Fagin Davis can be
followed here: https://web.archive.org/save/https://brokenbooks2.omeka.net/.
polemic that ensued the 2005 high-profile publication by John L. Cisne in Science [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. This
paper used methods from geology and population biology to estimate the rate of manuscript
loss for a set of early medieval text traditions, but has almost instantly been met with severe,
yet well-founded criticism from a number of well-placed medievalists [
        <xref ref-type="bibr" rid="ref15 ref34">15, 35</xref>
        ].
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Related work: bibliometry and the unknown species problem</title>
      <p>
        In a pioneering contribution, Egghe &amp; Proot [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] have proposed a probabilistic model that
attempts to estimate the level of loss for historic works (multi-copy documents), based on the
frequency with which retrieved copies of such works survive. Their case study was based on
bibliometric data, drawn from a short-title catalogue of printed works from the Low Countries.
Follow-up work has confirmed the practical usefulness of their approach for printed works [
        <xref ref-type="bibr" rid="ref21 ref22 ref32">34,
23, 22, 33</xref>
        ]. Their model can be formalized as follows:
fˆ0 =
(
1
      </p>
      <p>
        2f2
1 + (a−1)f1
)a
(1)
In this formula, f1 is the number of works in a given corpus that survive in exactly one copy and
f2 the number of works that survive in exactly two copies; a is the number of copies that were
produced of each work, which is the so-called “run” for printed works (which they set to 500
copies). These coefficients are then used to estimate fˆ0, or the proportion of lost works from
the total, original population of works. In a sagacious response to Egghe &amp; Proot [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], Burrell
[4] has noted that their task could be considered as a variant of a much older problem, namely
the “unseen species problem”. This problem is studied in various fields, ranging from ecology
to genetics, where scholars have to estimate aspects of species diversity (e.g. biota richness)
in a specific assemblage on the basis of highly incomplete samples of the full population [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
This task has a rich tradition in biostatistics, reaching back to the 1940s, with the work of
Alexander Steven Corbet, who had been trapping and inventorizing new butterflies species in
British Malaya for two years [
        <xref ref-type="bibr" rid="ref30">31</xref>
        ]. In collaboration with the statistician R.A. Fisher [
        <xref ref-type="bibr" rid="ref17">18</xref>
        ], he
formulated a model to estimate the number of new species he would discover, if he were to
continue his trapping eforts for another two years.
      </p>
      <p>
        Nowadays there exists a variety of statistical approaches to the unseen species problem
that can be borrowed from ecodiversity, an interdisciplinary domain where researchers study,
amongst other things, the biota richness in ecosystems. Monitoring the number of unique
species, for example, is a key task for various environmental reasons, for instance, to assess the
impact of natural disasters [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. These approaches are well established [
        <xref ref-type="bibr" rid="ref19 ref28 ref7">29, 7, 20</xref>
        ] but not all
of these are applicable to our kind of data. Applying the pioneering model by Egghe &amp; Proot
[
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], for instance, is not without theoretical issues, because the concept of a print run is almost
meaningless in this context (cf. the a coefficient in Eq. 1), even though the authors show
that its efect is limited. The serial production of handwritten text carriers was extremely
uncommon throughout the Middle Ages, as books were still highly customized luxury objects
that were never mass-produced. It seems impossible to provide an estimate for this parameter,
also because the available evidence suggests that the number of original copies per work was
heavily skewed (dependent on factors such as genre, language or general prestige). It is likely
that the large majority of medieval works already originally only existed in very few copies
(i.e. singletons or doubletons).
      </p>
      <p>
        To a reasonable extent, these methodological caveats are mitigated by the Chao1 estimator
[
        <xref ref-type="bibr" rid="ref5 ref6">6, 5</xref>
        ], a non-parametric method that is robust (even universally valid) in the face of unknown
species abundance distributions and enables the comparison of species richness across multiple
assemblages [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Previous, exploratory work in literary studies [
        <xref ref-type="bibr" rid="ref25">26</xref>
        ] has shown that this
estimator produces interesting results for handwritten sources. This method is especially attractive
for highly diverse, log-normally distributed assemblages, typical of human cultural production,
where many species are infrequent and thus hard to detect. In such cases, it is futile to try
and ofer a precise point estimate; Chao1 therefore rather ofers an accurate lower bound of
the number of undetected species in a sample. The estimator is given by [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]:
fˆ0 =  (n −n 1) (2ff122) if f2 &gt; 0; (2)
      </p>
      <p>
         (n −n 1) f1(f12− 1) if f2 = 0
Here, f1 is the number of species sighted exactly once in the sample (singletons), f2 the
number of species that were sighted twice (doubletons), and n the observed, total sample size
(cf. Eq. 1). Finally, fˆ0 is the estimated lower bound for the number of species that do exist
in the assemblage, but which were sighted zero times, i.e. the number of undetected species.
To obtain a confidence interval, a simple bootstrap procedure can be applied, in which the
available data is iteratively resampled [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>
        An attractive feature of this estimator is that it can be naturally extended to estimate the
number of lost documents (instead of the number of lost works) [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Field workers tasked
with biodiversity sampling often do not observe a substantial fraction of the biota that live
in a certain assemblage. While Chao1 can estimate how many of this low-abundance species
have (minimally) gone undetected, it does not tell us how much additional efort would be
required to observe these, i.e. how many additional m individuals would have to be sampled to
observe all of the biota at least once. Put informally, with respect to the species accumulation
curve (cf. Fig. 2), we would like to find out in which area the asymptote starts to kick in.
      </p>
      <p>
        Using the same abundance data as above, this extension of Chao1 tries to estimate at
which point every species would have been observed at least as a singleton. The singletons
in the enlarged sample of size m + n (where n is still the number of previously observed
individuals in the sample) would fall apart in two distinct categories [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]: (1) singletons from
the original sample, for which no additional individuals are detected by the enlarged sample,
and (2) previously undetected species for which exactly one individual is observed during the
additional sampling. The estimator aims to calculate the proportion between (1) and (2) to
determine m on the basis of two functions. The first function, h(x) = 2f1(1 + x), is a linear
transformation of x, whereas the second function, v(x) = exp[x(2f2/f1)], is an exponentially
increasing function; v is bound to intersect h at a certain x∗ &gt; 0. The number of additional
m individuals that are theoretically required to observe the full richness of a population is
given by: m = nx∗. Here too, a bootstrapping procedure can be used to estimate a confidence
interval.
      </p>
      <p>Regarding historic literature, the analogy in applying this method is straightforward: how
many additional documents would have to be rediscovered in the future to observe all works at
least once? While this estimate has very useful, practical implications for philologists scanning
archives for new fragments, the resulting number, m + n, also has theoretical relevance, since
it would be reasonably close to the actual size of the original population of documents. Thus,
m + n would allow us to estimate the historic loss of documents, based on a type of data
that is complementary to (and even completely independent from) the archival library records
mentioned above. Because of the log-normal distribution of literary works over documents, we
expect that most works were of an extremely low historic abundance, i.e. they were already
originally produced in very low numbers of copies. We can therefore assume that the majority
of works that are currently unknown will in the future only be detected in a low number of
documents. Once we would have observed all works, we can therefore expect that we would also
have observed most documents. Thus, while the outcome of this method should not be treated
as a precise point estimate – like Chao1, it too estimates the minimal sampling efort required –
we argue that it ofers a useful approximation of the historical loss of documents. Nevertheless,
we should emphasize that this method likely yields an underestimation of the original document
richness and it would not account for specific aspects of the historic document mass, for
instance, in cases where presently unknown works actually survive in more than one (so far
undetected) documents.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Estimating the loss of Middle Dutch chivalric epics</title>
      <p>
        We have collected the surviving works and documents from the
genre of Middle Dutch chivalric epics (ridderepiek) as abundance
data, where we record in how many documents a particular work
has been “sighted”. This data is mainly drawn from Kienhorst’s
acclaimed repertory [
        <xref ref-type="bibr" rid="ref26">27</xref>
        ] but we have updated this information
with newer, and even very recent findings (situation as of 10 July
2020).2 The main bibliographic information can be gleaned from
Table 1, showing, in the last row, how the 75 presently known
works are distributed over the 167 documents (=n) that have
been retrieved. 45 works are attested as unica in only a single
source (=f 1); 13 works are doubletons (=f 2).
      </p>
      <sec id="sec-3-1">
        <title>3.1. Loss of works</title>
        <p>
          We can plug these numbers into the equations presented above
and arrive at the estimates presented in Table 2. Here, we
additionally give the estimate of the so-called Jackknife procedure
(following the reference implementation in [
          <xref ref-type="bibr" rid="ref37">38</xref>
          ]), a historic
alternative to the more recent estimators that generally aim to
reduce the bias in estimators [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. Importantly, this approach lacks theoretical justification [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]
but ofers a surprisingly solid baseline in many practical applications [
          <xref ref-type="bibr" rid="ref28 ref7">7, 29</xref>
          ]. We mention in
passing that such techniques have already been applied in domains that border on the
Humanities, such as archaeology [
          <xref ref-type="bibr" rid="ref24">25, 17</xref>
          ]. We also present the confidence intervals (CI) obtained from
the bootstrap procedure: these are fairly wide but show considerable overlap, thus stressing
the relative agreement between the three estimators. The distribution of the bootstrap values
is shown in the rainplots (Fig. 1), except for the Jackknife (for which the CI is calculated
analytically). We observe that Chao1 gives the most conservative estimate for the loss of
works, which was to be expected, given the fact that it estimates a lower bound for the loss.
The Jackknife and EP procedure both estimate a higher loss rate (yet both in the same range).
2Data and code supporting this paper have been publicly archived: https://doi.org/10.5281/zenodo.4030681.
works documents
r
o
t
a
m
i
t
s
E
        </p>
        <p>EP</p>
        <p>Diversity estimation for the number of works
50
100
150
200 250
Number of works
300
350
400</p>
        <p>Crucial for the discussion below, is that all three estimators for the loss of works suggest that
only half (and potentially even less) of the original works that once existed are currently known
to us.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Loss of documents</title>
        <p>
          The final row in Table 1 gives the estimate
Method Estimate CI (with CI) for the loss of documents. While
Chao1 152.42 110.11 - 222.98 we should account for an extremely wide CI
Jackknife 177.00 127.81 - 226.19 in this case, the number suggest a survival
EP 170.71 116.77 - 268.49 rate of ≈8.15%, i.e. of an original
populaMinsample 2047.77 1064.19 - 4006.42 tion of 2047 documents, only 167 have
survived. We ofer a final and joint visualization
Table 2: Diversity estimates for the Middle Dutch of the results in Fig. 2. This plot shows what
chivalric epics (with CI). Last row is the result is known as a “species accumulation curve”
for minimum additional sampling. [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. The blue line plots the number of
retrieved works as an (asymptotic) function of
the number of documents recovered in this assemblage. The full line indicates the situation for
the observed sample, whereas the dashed part concerns the hypothetical increase, in the case
where more “sightings” would occur in the future. The grey distribution shows the bootstrap
values resulting from the minimum sampling efort estimator, broadly indicating the region
where we expect the curve to hit the asymptote.
        </p>
        <p>The green distribution in Fig. 2 requires additional explanation. The available models for
estimating the population size (as opposed to species diversity) of an under-sampled
assemblage typically assume that we have capture-recapture information available, instead of mere</p>
        <p>
          Species Accumulation Curve
species accumulation
minumum sampling
maculature
abundance data [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. We cannot extract such information from our data – because a
workdocument pair can in principle only be “sighted” once and after that it is not released again
“into the wild”. Nevertheless, in the case of manuscripts that have been recycled into
maculature, the remnants of the same document have often reappeared in diferent locations – an
extreme example is the Roman der Lorreinen-codex of which 9 fragments resurfaced, scattered
across 7 diferent libraries [
          <xref ref-type="bibr" rid="ref26">27</xref>
          ]. We can apply Chao1 to the documents in our corpus that
survive fragmentarily and represent them as abundance data, on the basis of the number of
fragments that resurfaced of them. This yields an assemblage of 141 documents surviving in
181 fragments, with f 1 = 118 and f 2 = 14. The application of Chao1 yields the following
estimate: 635.54 CI(449.85 - 947.25) (cf. the green area in Fig. 2). Note that this number does
not estimate the total number of documents that once existed, but rather the size of the subset
of manuscripts that were recycled into maculature. In combination with the other estimate,
our analyses suggest that ≈31% of the original population of documents with chivalric Middle
Dutch epics was recycled into maculature.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Simulations</title>
      <p>
        In this section, we compare the performance of the three estimators for species diversity using
simulated data. In the aforementioned seminal paper [
        <xref ref-type="bibr" rid="ref17">18</xref>
        ], Fisher proposed to model the
abundance of species in an assemblage as Sn = αxn/n, where Sn is the number of species with
an abundance of n, x a positive constant (0 &lt; x &lt; 1) which generally approaches 1 and α is
the number of singleton species in the assemblage. This logseries is still in wide use and and
can be used to define a discrete probability distribution, parameterized by two values: (i) the
number of singleton species in the population and (ii) the maximum abundance for a single
species (to put a practical cap on the distribution).
Egghe &amp; Proot
      </p>
      <p>Jackknife
estimator</p>
      <p>Chao1</p>
      <p>In an iterative process, we have generated assemblages from a logseries distribution for 250
works, for a fixed f 1 = 75 and x = .99. Next, we mimicked a distribution of these works over
a variable number of documents (in a linear range [500, 2500]). We then modelled historic
document loss as a fully stochastic process, in which documents are randomly dropped at a
certain loss rate (in the linear range [0.05-0.95]). We repeated each experiment 50 times with
diferent random seeds. We can then assess the performance of each estimator with respect
to the ground truth of 250 works. The violin plots in Fig. 3 show that Chao1 is the most
conservative evaluator that generally realizes the smallest deviation from the ground truth (cf.
dashed grey line). Fig. 4 plots the absolute error per estimator as a function of the varying
loss rate. Here, we see that Chao1 is most robust estimator throughout, except for extremely
small document keep rates (&lt; 0.1).</p>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion</title>
      <p>
        In his acclaimed 2006 history of Middle Dutch literature, Van Oostrom estimated that the
corpus of Middle Dutch chivalric epics must originally have comprised “at least 100 texts” [
        <xref ref-type="bibr" rid="ref36">37</xref>
        ].
All the estimators considered here agree that this in all likelihood too low an estimate: it seems
likely that at the very least 152 texts once existed, and potentially even more, of which only
75 (≈49%) survive now, providing an even firmer basis to the claims from a previous study
[
        <xref ref-type="bibr" rid="ref25">26</xref>
        ]. Chao1 proved a more reliable estimator in our simulations than the other two methods
studied here, which tended to overshoot and thus overestimate the loss of works. Middle Dutch
studies might have overestimated the representativeness of the surviving corpus, and future
studies should attempt to account for this bias.
      </p>
      <p>
        Although there are few previous estimates regarding the loss of works, we are on more solid
grounds regarding the loss of documents. In book history, scholars have studied the loss rates
for medieval documents, based on data for the sparse set of manuscript collections of which the
historic composition is known, so that they can be compared to the books from these collections
that are still extant today [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Such studies have estimated a cumulative survival rate of 7%
for the sort of non-illustrated manuscripts in which Middle Dutch romances typically were
copied [
        <xref ref-type="bibr" rid="ref29 ref36">39, 37, 30</xref>
        ]. While we should present these results with extreme caution for now, it
      </p>
      <p>Absolute reconstruction error (as a function of the per-simulation loss rate)
eEsgtgimheat&amp;orProot
Jackknife
Chao1
0.2
0.4
0.6</p>
      <p>0.8
keep rate
is remarkable that our analysis suggest an estimate that is in a surprisingly similar range,
i.e. ≈8.15% (167/2047 documents), although with a very wide CI (1064-4006). This approach
might nevertheless present an exciting new research avenue that could complement the existing
insights on the basis of a fully independent kind of evidence than the data used so far. Finally,
our analyses suggest that of the original population of documents with chivalric Middle Dutch
epics, ≈31% was recycled into maculature (i.e. 635/2047). While more research is needed to
support this claim, it is the very first time to the best of our knowledge that this proportion has
been estimated in a quantitative manner. This proportion is surprisingly high, which is maybe
good news for the philologist, who is after all more likely to discover fragmentary sources than
intact sources.</p>
      <p>A number of issues remain with the application of these methods that require further
attention. Problematic, for instance, is our assumption that document loss has been a fully
stochastic process (which is the way in which we naively simulate this phenomenon here).
Although there certainly are random aspects to this process, we know from traditional book
history that some codices were less likely to be lost: texts in convolutes had higher survival
chances, for instance, and the same has been hypothesized for higher-end (e.g. illustrated)
manuscripts [39]. Future research should develop more principled, perhaps agent-based,
models to simulate document loss than the fully stochastic approach adopted here. Finally, it would
be interesting to extend this approach to a wider geographic and linguistic range, since these
methods allow for an interesting cross-cultural comparison regarding the survival of medieval
literature. This geographic variation will be a central component of our future work.</p>
      <sec id="sec-5-1">
        <title>5.1. Acknowledgments</title>
        <p>The authors would like to thank Elisabeth de Bruijn and Remco Sleiderink (University of
Antwerp, BE) for the stimulating discussions, in particular about maculature. Additionally,
we would like to acknowledge the helpful bibliographic input of the participants at the Dark
Archives 20/20 conference (https://web.archive.org/save/https://aevum.space/darkarchives)
where this work was previously presented.
[17]</p>
        <p>M. Eren et al. “Estimating the Richness of a Population When the Maximum Number
of Classes Is Fixed: A Nonparametric Solution to an Archaeological Problem”. In: PLOS
ONE 7.5 (May 2012), pp. 1–11.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A. C.</given-names>
            <surname>Barbrook</surname>
          </string-name>
          et al. “
          <article-title>The phylogeny of The Canterbury Tales”</article-title>
          .
          <source>In: Nature</source>
          <volume>394</volume>
          .6696 (
          <year>1998</year>
          ), p.
          <fpage>839</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>E.</given-names>
            <surname>Buringh</surname>
          </string-name>
          .
          <article-title>Medieval Manuscript Production in the Latin West, Explorations with a Global Database</article-title>
          . Brill,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3] [4]
          <string-name>
            <given-names>K.</given-names>
            <surname>Burnham</surname>
          </string-name>
          and
          <string-name>
            <given-names>W.</given-names>
            <surname>Overton</surname>
          </string-name>
          . “
          <article-title>Robust Estimation of Population Size When Capture Probabilities Vary Among Animals”</article-title>
          .
          <source>In: Ecology 60.5</source>
          (
          <issue>1979</issue>
          ), pp.
          <fpage>927</fpage>
          -
          <lpage>936</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>Q.</given-names>
            <surname>Burrell</surname>
          </string-name>
          . “
          <article-title>Some comments on “The estimation of lost multi-copy documents: A new type of informetrics theory” by Egghe and Proot”</article-title>
          .
          <source>In: Journal of Informetrics 2.1</source>
          (
          <issue>2008</issue>
          ), pp.
          <fpage>101</fpage>
          -
          <lpage>105</lpage>
          . issn:
          <fpage>1751</fpage>
          -
          <lpage>1577</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Chao</surname>
          </string-name>
          . “
          <article-title>Estimating the Population Size for Capture-Recapture Data with Unequal Catchability”</article-title>
          .
          <source>In: Biometrics 43.4</source>
          (
          <issue>1987</issue>
          ), pp.
          <fpage>783</fpage>
          -
          <lpage>791</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Chao</surname>
          </string-name>
          . “
          <article-title>Nonparametric Estimation of the Number of Classes in a Population”</article-title>
          .
          <source>In: Scandinavian Journal of Statistics 11.4</source>
          (
          <issue>1984</issue>
          ), pp.
          <fpage>265</fpage>
          -
          <lpage>270</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Chao</surname>
          </string-name>
          and
          <string-name>
            <given-names>C.-H.</given-names>
            <surname>Chiu</surname>
          </string-name>
          . “Species Richness:
          <article-title>Estimation and Comparison”</article-title>
          . In: Aug.
          <year>2016</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>26</lpage>
          . isbn:
          <volume>9781118445112</volume>
          . doi:
          <volume>10</volume>
          .1002/9781118445112.stat03432.
          <year>pub2</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Chao</surname>
          </string-name>
          and
          <string-name>
            <given-names>L.</given-names>
            <surname>Jost</surname>
          </string-name>
          . “
          <article-title>Estimating diversity and entropy profiles via discovery rates of new species”</article-title>
          .
          <source>In: Methods in Ecology and Evolution 6</source>
          .8 (
          <issue>2015</issue>
          ), pp.
          <fpage>873</fpage>
          -
          <lpage>882</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Chao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y. T.</given-names>
            <surname>Wang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Jost</surname>
          </string-name>
          . “
          <article-title>Entropy and the species accumulation curve: a novel entropy estimator via discovery rates of new species”</article-title>
          .
          <source>In: Methods in Ecology and Evolution</source>
          <volume>4</volume>
          .11 (
          <year>2013</year>
          ), pp.
          <fpage>1091</fpage>
          -
          <lpage>1100</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Chao</surname>
          </string-name>
          et al. “
          <article-title>Sufficient sampling for asymptotic minimum species richness estimators”</article-title>
          .
          <source>In: Ecology 90.4</source>
          (
          <issue>2009</issue>
          ), pp.
          <fpage>1125</fpage>
          -
          <lpage>1133</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>J. L.</given-names>
            <surname>Cisne</surname>
          </string-name>
          . “How Science Survived: Medieval Manuscripts' “Demography” and Classic Texts'
          <article-title>Extinction”</article-title>
          .
          <source>In: Science 307.5713</source>
          (
          <year>2005</year>
          ), pp.
          <fpage>1305</fpage>
          -
          <lpage>1307</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>R. M.</given-names>
            <surname>Cormack</surname>
          </string-name>
          . “
          <article-title>Log-Linear Models for Capture-Recapture”</article-title>
          .
          <source>In: Biometrics 45.2</source>
          (
          <issue>1989</issue>
          ), pp.
          <fpage>395</fpage>
          -
          <lpage>413</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Cuthbert</surname>
          </string-name>
          . “
          <article-title>Tipping the Iceberg: Missing Italian Polyphony from the Age of Schism”</article-title>
          .
          <source>In: Musica Disciplina</source>
          <volume>54</volume>
          (
          <year>2009</year>
          ), pp.
          <fpage>39</fpage>
          -
          <lpage>74</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>A.</given-names>
            <surname>Daly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Baetens</surname>
          </string-name>
          , and B. De Baets. “
          <article-title>Ecological Diversity: Measuring the Unmeasurable”</article-title>
          .
          <source>In: Mathematics 6.7 (July</source>
          <year>2018</year>
          ), p.
          <fpage>119</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>G. Declercq.</surname>
          </string-name>
          “Comment on “How Science Survived: Medieval Manuscripts' “Demography” and Classic Texts' Extinction””.
          <source>In: Science 310.5754</source>
          (
          <year>2005</year>
          ), pp.
          <fpage>1618</fpage>
          -
          <lpage>1618</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>L.</given-names>
            <surname>Egghe</surname>
          </string-name>
          and
          <string-name>
            <surname>G. Proot. “</surname>
          </string-name>
          <article-title>The estimation of the number of lost multi-copy documents: A new type of informetrics theory”</article-title>
          .
          <source>In: Journal of Informetrics 1.4</source>
          (
          <issue>2007</issue>
          ), pp.
          <fpage>257</fpage>
          -
          <lpage>268</lpage>
          . issn:
          <fpage>1751</fpage>
          -
          <lpage>1577</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>R.</given-names>
            <surname>Fisher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. S.</given-names>
            <surname>Corbet</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Williams</surname>
          </string-name>
          . “
          <article-title>The Relation Between the Number of Species and the Number of Individuals in a Random Sample of an Animal Population”</article-title>
          .
          <source>In: The Journal of Animal Ecology 12.1</source>
          (
          <issue>1943</issue>
          ), pp.
          <fpage>42</fpage>
          -
          <lpage>58</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>D.</given-names>
            <surname>Geirnaert</surname>
          </string-name>
          . ““
          <article-title>Membra disiecta”: banden met het versneden verleden”</article-title>
          . In: Medioneerlandistiek. Een inleiding tot de Middelnederlandse letterkunde. Ed. by
          <string-name>
            <given-names>R.</given-names>
            <surname>Jansen-Sieben</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Janssens</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Willaert</surname>
          </string-name>
          . Verloren,
          <year>2000</year>
          , pp.
          <fpage>85</fpage>
          -
          <lpage>101</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>N. J.</given-names>
            <surname>Gotelli</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Chao</surname>
          </string-name>
          . “
          <article-title>Measuring and Estimating Species Richness, Species Diversity, and Biotic Similarity from Sampling Data”</article-title>
          . In: Encyclopedia of Biodiversity (Second Edition). Ed. by
          <string-name>
            <given-names>S. A.</given-names>
            <surname>Levin. Second Edition</surname>
          </string-name>
          . Waltham: Academic Press,
          <year>2013</year>
          , pp.
          <fpage>195</fpage>
          -
          <lpage>211</lpage>
          . isbn:
          <fpage>978</fpage>
          -0-
          <fpage>12</fpage>
          -384720-1.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Green</surname>
          </string-name>
          . “
          <article-title>Digital manuscripts as sites of touch: using social media for 'hands-on' engagement with medieval manuscript materiality”</article-title>
          .
          <source>In: Archive Journal</source>
          <volume>6</volume>
          (
          <issue>Sept</issue>
          .
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>J.</given-names>
            <surname>Green</surname>
          </string-name>
          and
          <string-name>
            <surname>F. McIntyre.</surname>
          </string-name>
          “
          <article-title>Lost Incunable Editions: Closing in on an Estimate”</article-title>
          .
          <source>In: Lost Books. Reconstructing the Print World of Pre-Industrial Europe</source>
          . Ed. by
          <string-name>
            <given-names>F.</given-names>
            <surname>Bruni</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Pettegree</surname>
          </string-name>
          . Brill,
          <year>2016</year>
          , pp.
          <fpage>55</fpage>
          -
          <lpage>72</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>J.</given-names>
            <surname>Green</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>McIntyre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and P.</given-names>
            <surname>Needham</surname>
          </string-name>
          . “
          <article-title>The Shape of Incunable Survival and Statistical Estimation of Lost Editions”</article-title>
          .
          <source>In: The Papers of the Bibliographical Society of America 105.2</source>
          (
          <issue>2011</issue>
          ), pp.
          <fpage>141</fpage>
          -
          <lpage>175</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>T.</given-names>
            <surname>Haye</surname>
          </string-name>
          . Verlorenes Mittelalter:
          <article-title>Ursachen und Muster der Nichtüberlieferung mittellateinischer Literatur</article-title>
          . Brill,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>D.</given-names>
            <surname>Kaufman</surname>
          </string-name>
          . “Measuring Archaeological Diversity:
          <article-title>An Application of the Jackknife Technique”</article-title>
          .
          <source>In: American Antiquity 63.1</source>
          (
          <issue>1998</issue>
          ), pp.
          <fpage>73</fpage>
          -
          <lpage>85</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>M.</given-names>
            <surname>Kestemont</surname>
          </string-name>
          and
          <string-name>
            <given-names>F.</given-names>
            <surname>Karsdorp</surname>
          </string-name>
          . “Het Atlantis van de Middelnederlandse ridderepiek.
          <article-title>Een schatting van het tekstverlies met methodes uit de ecodiversiteit”</article-title>
          .
          <source>In: Spiegel der Letteren 61.3</source>
          (
          <issue>2019</issue>
          ), pp.
          <fpage>271</fpage>
          -
          <lpage>290</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>H.</given-names>
            <surname>Kienhorst. De handschriften van de</surname>
          </string-name>
          <article-title>Middelnederlandse ridderepiek</article-title>
          .
          <source>Een codicologische beschrijving. Deel 1. Deventer studieën 9. Sub Rosa</source>
          ,
          <year>1988</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>E.</given-names>
            <surname>Kwakkel</surname>
          </string-name>
          .
          <source>Books Before Print: Electronic Representations of Literary Texts</source>
          . Amsterdam University Press,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [29]
          <string-name>
            <surname>E. Marcon.</surname>
          </string-name>
          “
          <article-title>Practical Estimation of Diversity from Abundance Data”. working paper or preprint</article-title>
          .
          <source>Oct</source>
          .
          <year>2015</year>
          . url: https://hal-agroparistech.
          <article-title>archives-ouvertes</article-title>
          .fr/hal-01212435.
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>U.</given-names>
            <surname>Neddermeyer</surname>
          </string-name>
          .
          <article-title>Von der Handschrift zum gedruckten Buch. Schriftlichkeit und Leseinteresse im Mittelalter und in der frühen Neuzeit. Quantitative und qualitative Aspekte</article-title>
          . Harrassowitz,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>A.</given-names>
            <surname>Orlitsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. T.</given-names>
            <surname>Suresh</surname>
          </string-name>
          , and
          <string-name>
            <surname>Y. Wu.</surname>
          </string-name>
          “
          <article-title>Optimal prediction of the number of unseen species”</article-title>
          .
          <source>In: Proceedings of the National Academy of Sciences of the United States of America 113.47</source>
          (
          <year>2016</year>
          ), pp.
          <fpage>13283</fpage>
          -
          <lpage>13288</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>A.</given-names>
            <surname>Pettegree</surname>
          </string-name>
          . “
          <article-title>The Legion of the Lost. Recovering the Lost Books of Early Modern Europe”</article-title>
          .
          <source>In: Lost Books. Reconstructing the Print World of Pre-Industrial Europe</source>
          . Ed. by
          <string-name>
            <given-names>F.</given-names>
            <surname>Bruni</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Pettegree</surname>
          </string-name>
          . Brill,
          <year>2016</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>27</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [33] [34]
          <string-name>
            <surname>G. Proot.</surname>
          </string-name>
          “
          <article-title>Survival Factors of Seventeenth-Century Hand-Press Books Published in the Southern Netherlands: The Importance of Sheet Counts, Sämmelbande and the Role of Institutional Collections”</article-title>
          .
          <source>In: Lost Books. Reconstructing the Print World of PreIndustrial Europe</source>
          . Ed. by
          <string-name>
            <given-names>F.</given-names>
            <surname>Bruni</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Pettegree</surname>
          </string-name>
          . Brill,
          <year>2016</year>
          , pp.
          <fpage>160</fpage>
          -
          <lpage>201</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          <string-name>
            <given-names>G.</given-names>
            <surname>Proot</surname>
          </string-name>
          and
          <string-name>
            <given-names>L.</given-names>
            <surname>Egghe</surname>
          </string-name>
          . “
          <article-title>Estimating Editions on the Basis of Survivals: Printed Programmes of Jesuit Plays in the Provincia Flandro-Belgica before 1773, with a Note on the “Book Historical Law””</article-title>
          .
          <source>In: The Papers of the Bibliographical Society of America 102.2</source>
          (
          <issue>2008</issue>
          ), pp.
          <fpage>149</fpage>
          -
          <lpage>174</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>N.</given-names>
            <surname>Pyenson</surname>
          </string-name>
          and
          <string-name>
            <given-names>L.</given-names>
            <surname>Pyenson</surname>
          </string-name>
          . “
          <article-title>Treating Medieval Manuscripts as Fossils”</article-title>
          .
          <source>In: Science 309.5735</source>
          (
          <year>2005</year>
          ), pp.
          <fpage>698</fpage>
          -
          <lpage>701</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>P. L.</given-names>
            <surname>Shillingsburg</surname>
          </string-name>
          . From Gutenberg to Google: Electronic Representations of Literary Texts. Cambridge University Press,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [37]
          <string-name>
            <surname>F. Van Oostrom.</surname>
          </string-name>
          <article-title>Stemmen op schrift</article-title>
          .
          <source>Geschiedenis van de Nederlandse literatuur van het begin tot 1300. Prometheus</source>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [38]
          <string-name>
            <given-names>J.-P.</given-names>
            <surname>Wang</surname>
          </string-name>
          . “SPECIES:
          <article-title>An R Package for Species Richness Estimation”</article-title>
          .
          <source>In: Journal of Statistical Software 40.9</source>
          (
          <issue>2011</issue>
          ), pp.
          <fpage>1</fpage>
          -
          <lpage>15</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>