Estimating the Loss of Medieval Literature with an
Unseen Species Model from Ecodiversity
Mike Kestemonta , Folgert Karsdorpb
a
    Department of Literature, University of Antwerp, Antwerp, Belgium
b
    Royal Netherlands Academy of Arts and Sciences, Meertens Institute, Amsterdam, The Netherlands


                                         Abstract
                                         The century-long loss of documents is one of the major impediments to the study of historic literature.
                                         Here we focus on Middle Dutch chivalric epics (ca. 1200-1450), a genre for which little archival
                                         records exist that shed light on the survival rates of works and documents. We cast the quantitative
                                         estimation of these survival rates as a variant of the unseen species problem from ecodiversity. We
                                         apply an established non-parametric method (Chao1) and compare it to a number of common
                                         alternatives on simulated data. Finally, we discuss the implications of our results for conventional
                                         philology: our numbers suggest that the losses sustained on the level of works may be more dramatic
                                         than previously imagined, whereas those at the document-level align surprisingly well with existing
                                         estimates in book history, although these were based on completely different data sources.

                                         Keywords
                                         medieval literature, book history, unknown species problem, Middle Dutch, ecodiversity


1. Introduction: the survival of premodern literature
The century-long loss of material artifacts is one of the major impediments to the study of the
history of human culture. Across various domains in the humanities, scholars must base their
study on incomplete archival collections that offer but a tiny fraction of the wealth of historical
specimens that originally existed. In this contribution, we focus on the domain of literature
from the High Medieval period in Western Europe, which has sustained significant losses in
the past centuries. Previous work has argued that unseen species models from ecodiversity
can be used to estimate the number of works (multi-copy documents) that have been lost to
us [e.g., 16, 26]. Although these models have already yielded interesting insights for early
modern printed works, they have hardly been applied to premodern handwritten literature so
far [exceptions include 13, 26]. Here, we apply Chao1, a non-parametric estimator of asymp-
totic species richness, to a representative corpus of Middle Dutch romances and quantitatively
evaluate its performance on simulated datasets. A novelty of this contribution is that we do
not only estimate the proportion of lost works in this dataset, but also the number of lost
documents, through an extension of Chao1, which aims to gauge the number of additional
samples that would be minimally required to reach the asymptote of the species accumulation
curve, estimated by the original model.
   In traditional philology, a theoretical distinction is typically drawn between the abstract
notion of a “work” and the physical “documents” (witnesses, carriers) in which the work is
CHR 2020: Workshop on Computational Humanities Research, November 18–20, 2020, Amsterdam, The
Netherlands
£ mike.kestemont@uantwerpen.be (M. Kestemont); folgert.karsdorp@meertens.knaw.nl (F. Karsdorp)
Ǳ 0000-0003-3590-693X (M. Kestemont); 0000-0002-5958-0551 (F. Karsdorp)
                                       © 2020 Copyright for this paper by its authors.
                                       Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073       CEUR Workshop Proceedings (CEUR-WS.org)


                                                                                            44
attested in some version [36]. Throughout the Middle Ages (ca. 500-1500 ad), handwritten
media, such as manuscripts or scrolls, were the primary physical medium for the sustainable
exchange of literary texts [28, 2]. Before the advent of printing, all witnesses of a text were
hand-copied from pre-existing exemplars, a practice that yielded textual traditions in which
intricate interdependencies exist between copies. The document tree resulting from this pro-
cess is known as the stemma codicum and such trees are nowadays studied in the field of
phylogenetics [1].
   Medieval text traditions, however, rarely survive in full, as many documents have now been
lost, due to a variety of historical reasons, including natural or infrastructural disasters (e.g.
library fires, such as the famous example of Alexandria) but also the wilful destruction by
humans, such as the controlled disposition of duplicates by heritage institutions or collectors
[2]. Moreover, many sources have only survived fragmentarily and often the severely damaged
remnants of the same book are nowadays even scattered across various locations.1 This is
related to the fact that, in the premodern period, book binders regularly recycled parchment
codices into “maculature” that was used, for instance, to strengthen the spines of newer books,
which eventually ended up in different locales [19].
   We can assume that a large fraction of premodern documents, if not an absolute majority, is
nowadays unknown to us [32], either because the documents no longer exist, or because they
have not been recovered yet (e.g., due to cataloguing initiatives that are lagging behind) [22].
Consequently, a great deal of works are also unknown to us, in the obvious case where all the
documents representing a work are currently unknown [40, 24]. These assumptions are not only
justified by the many references in historic sources to works that we no longer know, but also
by the constant stream of new material findings nowadays – which has clearly been intensified
in recent decades by the emergence of the internet and social media [21]. Understanding the
literary preferences of the past, and explaining historical shifts therein, is one of the core tasks
of cultural studies and a prerequisite for producing valid literary histories. Nevertheless, it is
clear that the situation of partial observability, outlined above, severely compromises this task:
the available data only constitute a very limited sample of an original population of literature
that was much larger and more diverse. In statistical terms, our present-day perspective is
by necessity biased towards the materials that actually survived. Understandably, scholars
invariably agree that it is vital to correct these biased preconceptions and account for the
materials which are are no longer known to us [40, 24, 2].
   Methodologically, it is important to separate the loss of documents, from the loss of works
which it entails. As to the second matter, the loss of works, there has been very little empirical
work in the field of medieval studies, beyond the descriptive analysis of historic references
to lost works [40, 24]. There has been some empirical research into the first matter. Book
historical studies (such as [2]) have mainly studied the survival rate of documents on the basis
of the limited set of medieval collections, of which the composition is exactly known at specific
points in time. This allows one to quantify the gradual, diachronic loss of documents from these
collections. While these estimates are currently among the best we have, it is clear that it can
be hard to extrapolate these numbers to other regions, languages or collection environments
(e.g. monastic vs. lay book possession), so that alternative approaches to complement this
methodology would be a valuable addition to the field. Finally, it is worth mentioning the

   1
     A dramatic example is the Beauvais Missal, of which the dismembered folios are currently being pieced
together again in a virtual reconstruction. Updates on the Broken Books project by Lisa Fagin Davis can be
followed here: https://web.archive.org/save/https://brokenbooks2.omeka.net/.


                                                   45
polemic that ensued the 2005 high-profile publication by John L. Cisne in Science [11]. This
paper used methods from geology and population biology to estimate the rate of manuscript
loss for a set of early medieval text traditions, but has almost instantly been met with severe,
yet well-founded criticism from a number of well-placed medievalists [15, 35].


2. Related work: bibliometry and the unknown species problem
In a pioneering contribution, Egghe & Proot [16] have proposed a probabilistic model that
attempts to estimate the level of loss for historic works (multi-copy documents), based on the
frequency with which retrieved copies of such works survive. Their case study was based on
bibliometric data, drawn from a short-title catalogue of printed works from the Low Countries.
Follow-up work has confirmed the practical usefulness of their approach for printed works [34,
23, 22, 33]. Their model can be formalized as follows:
                                           (               )a
                                                   1
                                     fˆ0 =           2f2
                                                                                           (1)
                                              1 + (a−1)f 1

In this formula, f1 is the number of works in a given corpus that survive in exactly one copy and
f2 the number of works that survive in exactly two copies; a is the number of copies that were
produced of each work, which is the so-called “run” for printed works (which they set to 500
copies). These coeﬀicients are then used to estimate fˆ0 , or the proportion of lost works from
the total, original population of works. In a sagacious response to Egghe & Proot [16], Burrell
[4] has noted that their task could be considered as a variant of a much older problem, namely
the “unseen species problem”. This problem is studied in various fields, ranging from ecology
to genetics, where scholars have to estimate aspects of species diversity (e.g. biota richness)
in a specific assemblage on the basis of highly incomplete samples of the full population [14].
This task has a rich tradition in biostatistics, reaching back to the 1940s, with the work of
Alexander Steven Corbet, who had been trapping and inventorizing new butterflies species in
British Malaya for two years [31]. In collaboration with the statistician R.A. Fisher [18], he
formulated a model to estimate the number of new species he would discover, if he were to
continue his trapping efforts for another two years.
   Nowadays there exists a variety of statistical approaches to the unseen species problem
that can be borrowed from ecodiversity, an interdisciplinary domain where researchers study,
amongst other things, the biota richness in ecosystems. Monitoring the number of unique
species, for example, is a key task for various environmental reasons, for instance, to assess the
impact of natural disasters [14]. These approaches are well established [29, 7, 20] but not all
of these are applicable to our kind of data. Applying the pioneering model by Egghe & Proot
[16], for instance, is not without theoretical issues, because the concept of a print run is almost
meaningless in this context (cf. the a coeﬀicient in Eq. 1), even though the authors show
that its effect is limited. The serial production of handwritten text carriers was extremely
uncommon throughout the Middle Ages, as books were still highly customized luxury objects
that were never mass-produced. It seems impossible to provide an estimate for this parameter,
also because the available evidence suggests that the number of original copies per work was
heavily skewed (dependent on factors such as genre, language or general prestige). It is likely
that the large majority of medieval works already originally only existed in very few copies
(i.e. singletons or doubletons).


                                                46
   To a reasonable extent, these methodological caveats are mitigated by the Chao1 estimator
[6, 5], a non-parametric method that is robust (even universally valid) in the face of unknown
species abundance distributions and enables the comparison of species richness across multiple
assemblages [7]. Previous, exploratory work in literary studies [26] has shown that this estima-
tor produces interesting results for handwritten sources. This method is especially attractive
for highly diverse, log-normally distributed assemblages, typical of human cultural production,
where many species are infrequent and thus hard to detect. In such cases, it is futile to try
and offer a precise point estimate; Chao1 therefore rather offers an accurate lower bound of
the number of undetected species in a sample. The estimator is given by [8]:
                                    
                                    
                                          (n − 1) f12
                                                          if f2 > 0;
                                              n (2f2 )
                              fˆ0 =                                                           (2)
                                     (n − 1) f1 (f1 − 1)
                                    
                                                           if f2 = 0
                                         n         2
Here, f1 is the number of species sighted exactly once in the sample (singletons), f2 the
number of species that were sighted twice (doubletons), and n the observed, total sample size
(cf. Eq. 1). Finally, fˆ0 is the estimated lower bound for the number of species that do exist
in the assemblage, but which were sighted zero times, i.e. the number of undetected species.
To obtain a confidence interval, a simple bootstrap procedure can be applied, in which the
available data is iteratively resampled [8].
   An attractive feature of this estimator is that it can be naturally extended to estimate the
number of lost documents (instead of the number of lost works) [10]. Field workers tasked
with biodiversity sampling often do not observe a substantial fraction of the biota that live
in a certain assemblage. While Chao1 can estimate how many of this low-abundance species
have (minimally) gone undetected, it does not tell us how much additional effort would be
required to observe these, i.e. how many additional m individuals would have to be sampled to
observe all of the biota at least once. Put informally, with respect to the species accumulation
curve (cf. Fig. 2), we would like to find out in which area the asymptote starts to kick in.
   Using the same abundance data as above, this extension of Chao1 tries to estimate at
which point every species would have been observed at least as a singleton. The singletons
in the enlarged sample of size m + n (where n is still the number of previously observed
individuals in the sample) would fall apart in two distinct categories [10]: (1) singletons from
the original sample, for which no additional individuals are detected by the enlarged sample,
and (2) previously undetected species for which exactly one individual is observed during the
additional sampling. The estimator aims to calculate the proportion between (1) and (2) to
determine m on the basis of two functions. The first function, h(x) = 2f1 (1 + x), is a linear
transformation of x, whereas the second function, v(x) = exp[x(2f2 /f1 )], is an exponentially
increasing function; v is bound to intersect h at a certain x∗ > 0. The number of additional
m individuals that are theoretically required to observe the full richness of a population is
given by: m = nx∗. Here too, a bootstrapping procedure can be used to estimate a confidence
interval.
   Regarding historic literature, the analogy in applying this method is straightforward: how
many additional documents would have to be rediscovered in the future to observe all works at
least once? While this estimate has very useful, practical implications for philologists scanning
archives for new fragments, the resulting number, m + n, also has theoretical relevance, since
it would be reasonably close to the actual size of the original population of documents. Thus,
m + n would allow us to estimate the historic loss of documents, based on a type of data


                                               47
that is complementary to (and even completely independent from) the archival library records
mentioned above. Because of the log-normal distribution of literary works over documents, we
expect that most works were of an extremely low historic abundance, i.e. they were already
originally produced in very low numbers of copies. We can therefore assume that the majority
of works that are currently unknown will in the future only be detected in a low number of
documents. Once we would have observed all works, we can therefore expect that we would also
have observed most documents. Thus, while the outcome of this method should not be treated
as a precise point estimate – like Chao1, it too estimates the minimal sampling effort required –
we argue that it offers a useful approximation of the historical loss of documents. Nevertheless,
we should emphasize that this method likely yields an underestimation of the original document
richness and it would not account for specific aspects of the historic document mass, for
instance, in cases where presently unknown works actually survive in more than one (so far
undetected) documents.


3. Estimating the loss of Middle Dutch chivalric epics
We have collected the surviving works and documents from the
genre of Middle Dutch chivalric epics (ridderepiek) as abundance                     works   documents
data, where we record in how many documents a particular work
                                                                                        45            1
has been “sighted”. This data is mainly drawn from Kienhorst’s                          13            2
acclaimed repertory [27] but we have updated this information                            6            3
with newer, and even very recent findings (situation as of 10 July                       2            4
2020).2 The main bibliographic information can be gleaned from                           4            5
                                                                                         1            6
Table 1, showing, in the last row, how the 75 presently known                            1            7
works are distributed over the 167 documents (=n) that have                              2           10
been retrieved. 45 works are attested as unica in only a single                          1           17
source (=f 1); 13 works are doubletons (=f 2).                                          75          167


3.1. Loss of works                                                            Table 1: Abundance data for the
                                                                                       Middle Dutch chivalric
We can plug these numbers into the equations presented above                           epics. Last row are the
                                                                                       total counts.
and arrive at the estimates presented in Table 2. Here, we ad-
ditionally give the estimate of the so-called Jackknife procedure
(following the reference implementation in [38]), a historic al-
ternative to the more recent estimators that generally aim to
reduce the bias in estimators [3]. Importantly, this approach lacks theoretical justification [12]
but offers a surprisingly solid baseline in many practical applications [7, 29]. We mention in
passing that such techniques have already been applied in domains that border on the Human-
ities, such as archaeology [25, 17]. We also present the confidence intervals (CI) obtained from
the bootstrap procedure: these are fairly wide but show considerable overlap, thus stressing
the relative agreement between the three estimators. The distribution of the bootstrap values
is shown in the rainplots (Fig. 1), except for the Jackknife (for which the CI is calculated
analytically). We observe that Chao1 gives the most conservative estimate for the loss of
works, which was to be expected, given the fact that it estimates a lower bound for the loss.
The Jackknife and EP procedure both estimate a higher loss rate (yet both in the same range).

   2
       Data and code supporting this paper have been publicly archived: https://doi.org/10.5281/zenodo.4030681.


                                                       48
                                            Diversity estimation for the number of works


                         Chao1
             Estimator


                            EP


                                 50   100      150        200         250        300       350   400
                                                          Number of works

Figure 1: Distribution of bootstrap estimations for Chao1 and ep on the Middle Dutch data. The Jackknife
          estimate (with its CI) is added with vertical lines.


Crucial for the discussion below, is that all three estimators for the loss of works suggest that
only half (and potentially even less) of the original works that once existed are currently known
to us.

3.2. Loss of documents
                                                           The final row in Table 1 gives the estimate
     Method          Estimate                    CI        (with CI) for the loss of documents. While
     Chao1              152.42      110.11 - 222.98
                                                           we should account for an extremely wide CI
     Jackknife          177.00      127.81 - 226.19        in this case, the number suggest a survival
     EP                 170.71      116.77 - 268.49        rate of ≈8.15%, i.e. of an original popula-
     Minsample        2047.77 1064.19 - 4006.42            tion of 2047 documents, only 167 have sur-
                                                           vived. We offer a final and joint visualization
Table 2: Diversity estimates for the Middle Dutch of the results in Fig. 2. This plot shows what
         chivalric epics (with CI). Last row is the result is known as a “species accumulation curve”
         for minimum additional sampling.
                                                           [9]. The blue line plots the number of re-
                                                           trieved works as an (asymptotic) function of
the number of documents recovered in this assemblage. The full line indicates the situation for
the observed sample, whereas the dashed part concerns the hypothetical increase, in the case
where more “sightings” would occur in the future. The grey distribution shows the bootstrap
values resulting from the minimum sampling effort estimator, broadly indicating the region
where we expect the curve to hit the asymptote.
  The green distribution in Fig. 2 requires additional explanation. The available models for
estimating the population size (as opposed to species diversity) of an under-sampled assem-
blage typically assume that we have capture-recapture information available, instead of mere


                                                             49
                                         Species Accumulation Curve
                       200                                                                0.0035
                       175
                                                                                          0.0030
                       150
                                                                                       0.0025
                       125                                        species accumulation
                                                                  minumum sampling 0.0020
               works


                       100
                                                                  maculature
                       75                                                              0.0015
                       50                                                                 0.0010
                       25                                                                 0.0005
                        0
                                                                                         0.0000
                             0    1000         2000       3000           4000         5000
                                                  documents
Figure 2: Species accumulation curve (blue), including bootstrap distribution for minimum additional document
          sampling (in grey; the dashed vertical grey line indicates the non-bootstrapped estimate) and for the
          maculature diversity (in green).


abundance data [5]. We cannot extract such information from our data – because a work-
document pair can in principle only be “sighted” once and after that it is not released again
“into the wild”. Nevertheless, in the case of manuscripts that have been recycled into macu-
lature, the remnants of the same document have often reappeared in different locations – an
extreme example is the Roman der Lorreinen-codex of which 9 fragments resurfaced, scattered
across 7 different libraries [27]. We can apply Chao1 to the documents in our corpus that
survive fragmentarily and represent them as abundance data, on the basis of the number of
fragments that resurfaced of them. This yields an assemblage of 141 documents surviving in
181 fragments, with f 1 = 118 and f 2 = 14. The application of Chao1 yields the following
estimate: 635.54 CI(449.85 - 947.25) (cf. the green area in Fig. 2). Note that this number does
not estimate the total number of documents that once existed, but rather the size of the subset
of manuscripts that were recycled into maculature. In combination with the other estimate,
our analyses suggest that ≈31% of the original population of documents with chivalric Middle
Dutch epics was recycled into maculature.


4. Simulations
In this section, we compare the performance of the three estimators for species diversity using
simulated data. In the aforementioned seminal paper [18], Fisher proposed to model the
abundance of species in an assemblage as Sn = αxn /n, where Sn is the number of species with
an abundance of n, x a positive constant (0 < x < 1) which generally approaches 1 and α is
the number of singleton species in the assemblage. This logseries is still in wide use and and
can be used to define a discrete probability distribution, parameterized by two values: (i) the
number of singleton species in the population and (ii) the maximum abundance for a single
species (to put a practical cap on the distribution).


                                                      50
                                                            Reconstructed richness (original = 250)
                                               900
                                               800
                                               700


                         Estimated diversity
                                               600
                                               500
                                               400
                                               300
                                               200
                                               100
                                                     Egghe & Proot         Jackknife             Chao1
                                                                           estimator

Figure 3: Results for the three estimators (with α = 50 for the Egghe & Proot method) for artificial assemblages
          of 250 works (see dashed vertical grey line) that were stochastically downsampled.


   In an iterative process, we have generated assemblages from a logseries distribution for 250
works, for a fixed f 1 = 75 and x = .99. Next, we mimicked a distribution of these works over
a variable number of documents (in a linear range [500, 2500]). We then modelled historic
document loss as a fully stochastic process, in which documents are randomly dropped at a
certain loss rate (in the linear range [0.05-0.95]). We repeated each experiment 50 times with
different random seeds. We can then assess the performance of each estimator with respect
to the ground truth of 250 works. The violin plots in Fig. 3 show that Chao1 is the most
conservative evaluator that generally realizes the smallest deviation from the ground truth (cf.
dashed grey line). Fig. 4 plots the absolute error per estimator as a function of the varying
loss rate. Here, we see that Chao1 is most robust estimator throughout, except for extremely
small document keep rates (< 0.1).


5. Discussion
In his acclaimed 2006 history of Middle Dutch literature, Van Oostrom estimated that the
corpus of Middle Dutch chivalric epics must originally have comprised “at least 100 texts” [37].
All the estimators considered here agree that this in all likelihood too low an estimate: it seems
likely that at the very least 152 texts once existed, and potentially even more, of which only
75 (≈49%) survive now, providing an even firmer basis to the claims from a previous study
[26]. Chao1 proved a more reliable estimator in our simulations than the other two methods
studied here, which tended to overshoot and thus overestimate the loss of works. Middle Dutch
studies might have overestimated the representativeness of the surviving corpus, and future
studies should attempt to account for this bias.
   Although there are few previous estimates regarding the loss of works, we are on more solid
grounds regarding the loss of documents. In book history, scholars have studied the loss rates
for medieval documents, based on data for the sparse set of manuscript collections of which the
historic composition is known, so that they can be compared to the books from these collections
that are still extant today [2]. Such studies have estimated a cumulative survival rate of 7%
for the sort of non-illustrated manuscripts in which Middle Dutch romances typically were
copied [39, 37, 30]. While we should present these results with extreme caution for now, it


                                                                           51
                                   Absolute reconstruction error (as a function of the per-simulation loss rate)
                       600


                       500


                       400

                                                                                                                         estimator
      Absolute error


                                                                                                                         Egghe & Proot
                       300                                                                                               Jackknife
                                                                                                                         Chao1


                       200


                       100


                        0

                             0.2                          0.4                        0.6                           0.8
                                                                     keep rate


Figure 4: The absolute error for each estimator in each simulation, as a function of the loss rates considered
          (given a ground truth of 250), with a cubic fit per method.


is remarkable that our analysis suggest an estimate that is in a surprisingly similar range,
i.e. ≈8.15% (167/2047 documents), although with a very wide CI (1064-4006). This approach
might nevertheless present an exciting new research avenue that could complement the existing
insights on the basis of a fully independent kind of evidence than the data used so far. Finally,
our analyses suggest that of the original population of documents with chivalric Middle Dutch
epics, ≈31% was recycled into maculature (i.e. 635/2047). While more research is needed to
support this claim, it is the very first time to the best of our knowledge that this proportion has
been estimated in a quantitative manner. This proportion is surprisingly high, which is maybe
good news for the philologist, who is after all more likely to discover fragmentary sources than
intact sources.
   A number of issues remain with the application of these methods that require further at-
tention. Problematic, for instance, is our assumption that document loss has been a fully
stochastic process (which is the way in which we naively simulate this phenomenon here).
Although there certainly are random aspects to this process, we know from traditional book
history that some codices were less likely to be lost: texts in convolutes had higher survival
chances, for instance, and the same has been hypothesized for higher-end (e.g. illustrated)
manuscripts [39]. Future research should develop more principled, perhaps agent-based, mod-
els to simulate document loss than the fully stochastic approach adopted here. Finally, it would
be interesting to extend this approach to a wider geographic and linguistic range, since these
methods allow for an interesting cross-cultural comparison regarding the survival of medieval
literature. This geographic variation will be a central component of our future work.


                                                                             52
5.1. Acknowledgments
The authors would like to thank Elisabeth de Bruijn and Remco Sleiderink (University of
Antwerp, BE) for the stimulating discussions, in particular about maculature. Additionally,
we would like to acknowledge the helpful bibliographic input of the participants at the Dark
Archives 20/20 conference (https://web.archive.org/save/https://aevum.space/darkarchives)
where this work was previously presented.


References
 [1] A. C. Barbrook et al. “The phylogeny of The Canterbury Tales”. In: Nature 394.6696
     (1998), p. 839.
 [2] E. Buringh. Medieval Manuscript Production in the Latin West, Explorations with a
     Global Database. Brill, 2011.
 [3] K. Burnham and W. Overton. “Robust Estimation of Population Size When Capture
     Probabilities Vary Among Animals”. In: Ecology 60.5 (1979), pp. 927–936.
 [4] Q. Burrell. “Some comments on “The estimation of lost multi-copy documents: A new
     type of informetrics theory” by Egghe and Proot”. In: Journal of Informetrics 2.1 (2008),
     pp. 101–105. issn: 1751-1577.
 [5] A. Chao. “Estimating the Population Size for Capture-Recapture Data with Unequal
     Catchability”. In: Biometrics 43.4 (1987), pp. 783–791.
 [6] A. Chao. “Nonparametric Estimation of the Number of Classes in a Population”. In:
     Scandinavian Journal of Statistics 11.4 (1984), pp. 265–270.
 [7] A. Chao and C.-H. Chiu. “Species Richness: Estimation and Comparison”. In: Aug. 2016,
     pp. 1–26. isbn: 9781118445112. doi: 10.1002/9781118445112.stat03432.pub2.
 [8] A. Chao and L. Jost. “Estimating diversity and entropy profiles via discovery rates of
     new species”. In: Methods in Ecology and Evolution 6.8 (2015), pp. 873–882.
 [9] A. Chao, Y. T. Wang, and L. Jost. “Entropy and the species accumulation curve: a
     novel entropy estimator via discovery rates of new species”. In: Methods in Ecology and
     Evolution 4.11 (2013), pp. 1091–1100.
[10]   A. Chao et al. “Suﬀicient sampling for asymptotic minimum species richness estimators”.
       In: Ecology 90.4 (2009), pp. 1125–1133.
[11]   J. L. Cisne. “How Science Survived: Medieval Manuscripts’ “Demography” and Classic
       Texts’ Extinction”. In: Science 307.5713 (2005), pp. 1305–1307.
[12]   R. M. Cormack. “Log-Linear Models for Capture-Recapture”. In: Biometrics 45.2 (1989),
       pp. 395–413.
[13]   M. S. Cuthbert. “Tipping the Iceberg: Missing Italian Polyphony from the Age of Schism”.
       In: Musica Disciplina 54 (2009), pp. 39–74.
[14]   A. Daly, J. Baetens, and B. De Baets. “Ecological Diversity: Measuring the Unmeasur-
       able”. In: Mathematics 6.7 (July 2018), p. 119.
[15]   G. Declercq. “Comment on “How Science Survived: Medieval Manuscripts’ “Demogra-
       phy” and Classic Texts’ Extinction””. In: Science 310.5754 (2005), pp. 1618–1618.


                                              53
[16]   L. Egghe and G. Proot. “The estimation of the number of lost multi-copy documents:
       A new type of informetrics theory”. In: Journal of Informetrics 1.4 (2007), pp. 257–268.
       issn: 1751-1577.
[17]   M. Eren et al. “Estimating the Richness of a Population When the Maximum Number
       of Classes Is Fixed: A Nonparametric Solution to an Archaeological Problem”. In: PLOS
       ONE 7.5 (May 2012), pp. 1–11.
[18]   R. Fisher, A. S. Corbet, and C. Williams. “The Relation Between the Number of Species
       and the Number of Individuals in a Random Sample of an Animal Population”. In: The
       Journal of Animal Ecology 12.1 (1943), pp. 42–58.
[19]   D. Geirnaert. ““Membra disiecta”: banden met het versneden verleden”. In: Medioneer-
       landistiek. Een inleiding tot de Middelnederlandse letterkunde. Ed. by R. Jansen-Sieben,
       J. Janssens, and F. Willaert. Verloren, 2000, pp. 85–101.
[20]   N. J. Gotelli and A. Chao. “Measuring and Estimating Species Richness, Species Diver-
       sity, and Biotic Similarity from Sampling Data”. In: Encyclopedia of Biodiversity (Second
       Edition). Ed. by S. A. Levin. Second Edition. Waltham: Academic Press, 2013, pp. 195–
       211. isbn: 978-0-12-384720-1.
[21]   J. M. Green. “Digital manuscripts as sites of touch: using social media for ‘hands-on’
       engagement with medieval manuscript materiality”. In: Archive Journal 6 (Sept. 2018).
[22]   J. Green and F. McIntyre. “Lost Incunable Editions: Closing in on an Estimate”. In: Lost
       Books. Reconstructing the Print World of Pre-Industrial Europe. Ed. by F. Bruni and
       A. Pettegree. Brill, 2016, pp. 55–72.
[23]   J. Green, F. McIntyre, and P. Needham. “The Shape of Incunable Survival and Statistical
       Estimation of Lost Editions”. In: The Papers of the Bibliographical Society of America
       105.2 (2011), pp. 141–175.
[24]   T. Haye. Verlorenes Mittelalter: Ursachen und Muster der Nichtüberlieferung mittel-
       lateinischer Literatur. Brill, 2016.
[25]   D. Kaufman. “Measuring Archaeological Diversity: An Application of the Jackknife Tech-
       nique”. In: American Antiquity 63.1 (1998), pp. 73–85.
[26]   M. Kestemont and F. Karsdorp. “Het Atlantis van de Middelnederlandse ridderepiek.
       Een schatting van het tekstverlies met methodes uit de ecodiversiteit”. In: Spiegel der
       Letteren 61.3 (2019), pp. 271–290.
[27]   H. Kienhorst. De handschriften van de Middelnederlandse ridderepiek. Een codicologische
       beschrijving. Deel 1. Deventer studieën 9. Sub Rosa, 1988.
[28]   E. Kwakkel. Books Before Print: Electronic Representations of Literary Texts. Amster-
       dam University Press, 2018.
[29]   E. Marcon. “Practical Estimation of Diversity from Abundance Data”. working paper or
       preprint. Oct. 2015. url: https://hal-agroparistech.archives-ouvertes.fr/hal-01212435.
[30]   U. Neddermeyer. Von der Handschrift zum gedruckten Buch. Schriftlichkeit und Lesein-
       teresse im Mittelalter und in der frühen Neuzeit. Quantitative und qualitative Aspekte.
       Harrassowitz, 1998.
[31]   A. Orlitsky, A. T. Suresh, and Y. Wu. “Optimal prediction of the number of unseen
       species”. In: Proceedings of the National Academy of Sciences of the United States of
       America 113.47 (2016), pp. 13283–13288.


                                               54
[32]   A. Pettegree. “The Legion of the Lost. Recovering the Lost Books of Early Modern
       Europe”. In: Lost Books. Reconstructing the Print World of Pre-Industrial Europe. Ed.
       by F. Bruni and A. Pettegree. Brill, 2016, pp. 1–27.
[33]   G. Proot. “Survival Factors of Seventeenth-Century Hand-Press Books Published in the
       Southern Netherlands: The Importance of Sheet Counts, Sämmelbande and the Role
       of Institutional Collections”. In: Lost Books. Reconstructing the Print World of Pre-
       Industrial Europe. Ed. by F. Bruni and A. Pettegree. Brill, 2016, pp. 160–201.
[34]   G. Proot and L. Egghe. “Estimating Editions on the Basis of Survivals: Printed Pro-
       grammes of Jesuit Plays in the Provincia Flandro-Belgica before 1773, with a Note on
       the “Book Historical Law””. In: The Papers of the Bibliographical Society of America
       102.2 (2008), pp. 149–174.
[35]   N. Pyenson and L. Pyenson. “Treating Medieval Manuscripts as Fossils”. In: Science
       309.5735 (2005), pp. 698–701.
[36]   P. L. Shillingsburg. From Gutenberg to Google: Electronic Representations of Literary
       Texts. Cambridge University Press, 2006.
[37]   F. Van Oostrom. Stemmen op schrift. Geschiedenis van de Nederlandse literatuur van
       het begin tot 1300. Prometheus, 2006.
[38]   J.-P. Wang. “SPECIES: An R Package for Species Richness Estimation”. In: Journal of
       Statistical Software 40.9 (2011), pp. 1–15.
[39]   H. Wijsman. Luxury Bound. Illustrated Manuscript Production and Noble and Princely
       Book Ownership in the Burgundian Netherlands (1400-1550). Brepols, 2010.
[40]   R. Wilson. The lost literature of medieval England. Methuen & Co, 1952.


                                             55