<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Beyond the Register: Demographic Modeling of Arrest Patterns in 1879-1880 Brussels</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Folgert</forename><surname>Karsdorp</surname></persName>
							<email>folgert.karsdorp@meertens.knaw.nl</email>
							<affiliation key="aff0">
								<orgName type="institution">KNAW Meertens Institute</orgName>
								<address>
									<settlement>Amsterdam</settlement>
									<country key="NL">the Netherlands</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Mike</forename><surname>Kestemont</surname></persName>
							<email>mike.kestemont@uantwerpen.be</email>
							<affiliation key="aff1">
								<orgName type="institution">University of Antwerp</orgName>
								<address>
									<settlement>Antwerp</settlement>
									<country key="BE">Belgium</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Margo</forename><surname>De Koster</surname></persName>
							<email>margo.dekoster@ugent.be</email>
							<affiliation key="aff2">
								<orgName type="institution">Ghent University</orgName>
								<address>
									<settlement>Ghent</settlement>
									<country key="BE">Belgium</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Beyond the Register: Demographic Modeling of Arrest Patterns in 1879-1880 Brussels</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">A62C194465216F6AF59EABE8BBA7E9EC</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T19:50+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>unseen species model</term>
					<term>Generalized Chao</term>
					<term>survivorship bias</term>
					<term>police history</term>
					<term>historical criminology</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Unseen species models from ecology have recently been applied to censored historical cultural datasets to estimate unobserved populations. We extend this approach to historical criminology, analyzing the police registers of Brussels' Amigo prison (1879-1880) using the Generalized Chao estimator. Our study aims to quantify the 'dark number' of unarrested perpetrators and model demographic biases in policing efforts. We investigate how factors such as age, gender, and origin influence arrest vulnerability. While all examined covariates contribute positively to our model, their small effect sizes limit the model's predictive performance. Our findings largely align with prior historical scholarship but suggest that demographic factors alone may insufÏciently explain arrest patterns. The Generalized Chao estimator modestly improves population size estimates compared to simpler models. However, our results indicate that more refined models or additional data may be necessary for robust estimates in historical criminological studies. This work contributes to the growing field of computational methods in humanities research and offers insights into the challenges of quantifying hidden populations in historical datasets.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>statistical modeling approaches. While empirical validation of these estimates remains challenging, they provide valuable insights that can inform policy decisions and guide law enforcement strategies. In recent years, criminology has increasingly turned to advanced quantitative methods to address the negative bias in available data. Although these models have produced valuable insights into contemporary crime patterns, their application to historical data remains limited.</p><p>Our study aims to bridge this gap by applying one such modern technique, the Generalized Chao method <ref type="bibr" target="#b4">[5]</ref>, to a historical case study: the register of the Amigo prison in Brussels for the period 1879-1880. This unique source documents individuals imprisoned overnight in the municipal police's central detention institution. Given the complex nature of urban crime in a major 19th-century city like Brussels, it is reasonable to assume that police arrest efforts were imperfect, leaving a considerable amount of undetected crime. Our analysis aims to quantify this 'dark number' by estimating a lower bound on undetected crime in late-nineteenth century Brussels.</p><p>Furthermore, we leverage the demographic information available in the Amigo register to construct a model that predicts crime detection rates across different social groups. This approach allows us to explore historical vulnerability to arrest, potentially revealing differential treatment by the police and broader societal biases of the era. By applying contemporary statistical methods to historical data, we aim to provide new insights into the dynamics of crime and law enforcement in 19th-century urban environments. <ref type="foot" target="#foot_0">1</ref>Our research aligns with a growing scholarly interest in measuring the amount of underdetection in historical data, known as Überlieferungschance <ref type="bibr" target="#b9">[10]</ref>. This issue of under-sampling is pervasive across disciplines studying the past, where available datasets often represent only a fraction of the original historical population, be it artifacts or individuals. Imperfect registration and survival rates result in incomplete and potentially biased historical data, often skewed towards categories with higher survival ratios due to their material nature (e.g., stone inscriptions versus papyrus writings). This situation exemplifies 'survivorship bias', a concept that has gained popular attention, partly due to Abraham Wald's seminal work on R.A.F. bombers during World War II <ref type="bibr" target="#b16">[17,</ref><ref type="bibr" target="#b19">20]</ref>. <ref type="foot" target="#foot_1">2</ref> Humanities scholars have long recognized this precarious situation but often lacked a rigorous framework to address it systematically.</p><p>Recent scholarship has turned to unseen species models from ecology as an innovative approach to estimating the extent of lost or forgotten data, particularly in cultural heritage studies <ref type="bibr" target="#b14">[15]</ref>. In the Humanities, these models have found surprising applications across diverse domains, including medieval literature, early modern collections of printed books, and studies of Dutch sailors <ref type="bibr" target="#b23">[24,</ref><ref type="bibr" target="#b17">18,</ref><ref type="bibr" target="#b15">16]</ref>. Our work contributes to this emerging trend by applying these ecological models to historical criminology, offering new insights into the 'dark numbers' of 19th-century urban crime.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Materials: the Amigo register of 1880</head><p>The primary source for our study is the Amigo register, a serial, handwritten source housed in the City Archives Brussels, within the records of the Brussels municipal police. <ref type="foot" target="#foot_2">3</ref> This unique historical resource, illustrated in Fig. <ref type="figure" target="#fig_0">1</ref>, takes a tabular form, with each row recording information about an individual who spent the night in the Amigo prison. <ref type="foot" target="#foot_3">4</ref> Unfortunately, like many archival records of urban police forces, the Amigo register has suffered from poor conservation. For Brussels, only a few registers from the 1880s have survived. Our study focuses on the register covering the period from October 31, 1879, to November 1, 1880, encompassing all individuals detained in the Brussels police prison during this time.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Involuntary Arrests</head><p>A critical distinction in the Amigo register data is between two types of entries:</p><p>1. Involuntary detentions: Individuals forcefully arrested and detained by the police; 2. Voluntary stays (nuit sur demande or 'night on request'): Individuals who voluntarily sought shelter at the Amigo.</p><p>The latter category primarily comprised socially vulnerable people lacking financial means or family support. These individuals requested to spend the night at the Amigo for shelter, rather than being brought in through active law enforcement efforts. The voluntary nature of the nuit sure demande entries contrasts sharply with the involuntary detentions, potentially impacting our interpretation of arrest patterns and policing efforts in late 19th-century Brussels.</p><p>Our analysis primarily focuses on involuntary detentions, specifically individuals arrested and locked up in the Amigo police prison on charges of 'vagrancy'. At the end of the 19th century, the Belgian economy suffered a depression, which heightened concerns about the influx of foreign 'vagrants' and other perceived undesirable newcomers. in response, urban municipal authorities implemented stricter local ordinances targeting poor migrants. These included entry regulations and passport requirements at the city gates, as well as mandatory reporting by landlords and innkeepers about their lodgers' characteristics and previous settlements. Concurrently, new anti-vagrancy legislation criminalized several conditions: (a) the inability to prove stable attachment to a local community, (b) failure to register as required by law, and (c) lack of a steady income. From 1866 onwards, a police arrest and judicial conviction for vagrancy or begging resulted in forced internment in State Vagrancy Colonies (Rijksweldadigheidskolonies). Before being transported to the vagrancy colonies, arrested vagrants were held in the police prison of the city where the arrest was made: these were the individuals who were registered as 'vagrants' or 'beggars' in the Amigo register, and who constitute our research population <ref type="bibr" target="#b8">[9]</ref>. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Recidivism</head><p>Recidivism is a prominent feature in our dataset, with many individuals experiencing multiple arrests. Table <ref type="table">1</ref> illustrates the distribution of arrest counts. In total, the data holds evidence for 8,367 forceful arrests of 6,016 unique individuals. The mean number of nights an individual was locked up in the Amigo is therefore ∼ 1.39. However, this average masks a highly uneven distribution of arrests across perpetrators. While most individuals were registered only once, repeat offenders were common, with one extreme case involving 35 arrests for public intoxication within a single year.</p><p>Fig. <ref type="figure" target="#fig_2">2</ref> visualizes the temporal patterns of arrests for several individuals, revealing an interesting phenomenon of 'burstiness' in the data. Arrest events involving the same individual often form local temporal clusters. To investigate this pattern, we applied a simple Poisson model to all individuals arrested at least twice. This model predicts the time lag (in days) between an individual's arrests based on their number of previous arrests. As illustrated in Fig. <ref type="figure" target="#fig_3">3</ref>, the time lag generally decreases as the number of previous arrests increases. One way to interpret this is that the police gradually lost their patience with recidivists. Additionally, this suggests that the policing effort in arrests was not entirely neutral or random, but was guided by certain Table <ref type="table">1</ref> The distribution of the frequency with which individuals were arrested in the Amigo prison (𝑓 1 = number of people arrested once, 𝑓 2 = number of people arrested exactly twice, etc.), incl. the total number of arrests (𝑁 ) in the data and the unique individuals (𝑆 𝑜𝑏𝑠 ).   biases. The Amigo register holds information on the date of entry, but not on the release date: historians assume that the typical imprisonment was typically restricted to a short time period (i.e. a single night). The re-arrest data shows that individuals could indeed be re-registered after a short interval, which lends credibility to the assumption that incarceration was generally short-lived. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Research Hypotheses</head><p>There exist valid reasons to assume that these data do not cover all of the individuals which theoretically could have been arrested by the police. One of these is the elevated number of singletons and doubletons in this distribution: if so many individuals were only recorded once or twice in the data, this renders it statistically quite likely that many perpetrators were never captured at all and don't appear in the register. The arrest data must therefore be considered incomplete and, in all likelihood, severely underestimate the true number of perpetrators in Brussels for the time period considered.</p><p>The methodology described below enables us to estimate a lower bound on these "dark numbers" in historical Brussels, i.e. the number of perpetrators who were criminally active in this time period, but who were never formally registered by the police. While this estimand presents a valuable scholarly objective in itself, this paper is also interested in characterizing this unobserved share of the criminal population in terms of demographics. Were men more likely to be arrested than women? Was the police effort more heavily geared towards younger individuals? Is there evidence of an active bias towards foreigners in comparison to local citizens? And, are there any intersections in these biases? Answering these questions would shed more light on the subjective drivers behind arrests and, thus, the historical vulnerability to arrest across demographic groups. On a more abstract level, explaining the drivers of under-detection in such historical datasets represents a major advance with respect to previous work in this area, which could only estimate the mere size of the non-observed share of the population, but not explain or characterize its composition.</p><p>Based on previous historical studies of the Amigo register <ref type="bibr" target="#b8">[9]</ref>, we formulated several hy-potheses to guide our experimental design:</p><p>H.1 Impact of voluntary stays: We hypothesize that a previous confinement on request (nuit sur demande) might have increased the visibility of socially fragile individuals to the local police, thus increasing their vulnerability to future arrest. For example, Joseph Janssens, a 25-year-old construction worker, was granted night shelter on March 20, 1880, but was arrested for public drunkenness just days later on March 28. H.2 Age-related arrest patterns: We expect that vulnerability to arrest generally increased with an individual's age. Prior work suggests that the police were more lenient towards younger individuals, particularly children, although minors were surprisingly often incarcerated. Among recidivists, there is a high presence of elderly citizens, who were more socially vulnerable due to a lack of suitable care institutions. H.3 Gender disparities: While there appear to be fewer criminally active women in the city, many women are among the recidivists. This suggests that the police effort may have been negatively biased towards socially vulnerable women. H.4 Migratory status: Contrary to older assumptions about discriminatory police biases against migrants, recent work found no clear evidence for this in the register's data. In fact, the data suggest that local, sedentary citizens were much more vulnerable to arrest than perpetrators born outside of Brussels. H.5 Family-based biases: We hypothesize that there existed negative biases against members of perceived "criminal families. " The data shows frequent reappearance of the same family names across different individuals, sometimes with family members co-arrested. For instance, in autumn 1880, we find frequent arrests of 10 children (aged 8-10) from the same migrant family from Naples, Italy, who were active as beggars in the city.</p><p>After preprocessing and manual disambiguation of individuals appearing in the dataset, the following data was available for each individual:</p><p>𝛽.1 prior: Binary indicator ('prior' (reference level) and 'no prior') whether the individual was granted at least one "night on request" (nuit sur demande) prior to the date of their first arrest; 𝛽.2 age: An individual's mean age at the time of arrest, measured in years (scalar variable, centered and standardised). 𝛽.3 sex: An individual's biological sex (binary: 'female' or 'male' as reference level). We deliberately use the term 'sex' as the data contains no information on an individual's gender role. 𝛽.4 origin: Manually coded factor based on the individual's place of birth ('ABROAD' for individuals born outside of Belgium, 'BE' for those born in Belgium, and 'BXL' (reference level) for locally born individuals from Brussels). 𝛽.5 family: Binary indicator ('no family' (reference level) and 'family') whether an individual with the same family name occurs in the dataset. This variable aims to capture bias against known "criminal families" but should be interpreted cautiously, because we cannot rule out the possibility that unrelated individuals shared the same last name.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Estimating dark numbers with unseen species models</head><p>In this paper, we apply a so-called unseen species model to the Amigo data, borrowed from the biostatistical literature in ecology, to help solve the problem of estimating the number of unobserved perpetrators. Capture-recapture surveys are important bioregistration instruments in the field of ecology, used to monitor aspects of biodiversity, such as species richness (the number of unique species living in a certain area) <ref type="bibr" target="#b11">[12]</ref>. During such campaigns, field workers use a variety of trapping devices (e.g. cameras) to register animals, mark them and release them again, so that they can be re-sighted at a later time. This process results in what is known as "abundance data": counts that record how often animal types have been observed, such as singletons (𝑓 1 or the number of species sighted exactly once), doubletons (𝑓 2 or the number of species sighted exactly twice), etc. Because of the imperfect observation process, however, many animal types will not be observed during such campaigns, leading to an underestimation of the true ecological diversity ("unseen species"). The resulting count data must therefore be treated as censored, because it is zero-truncated: the number of relevant species which exist in the area but which were never observed (𝑓 0 ) are missing. Statistical methods are therefore used to estimate 𝑓 0 as f0 and correct for the observation bias, by adding f0 to 𝑆 (the number of observed species) to obtain an estimate of the true population size Ŝ . Chao1 <ref type="bibr" target="#b6">[7]</ref>, for instance, is a widely used estimator that estimates a lower bound on 𝑓 0 as follows: f0 ≥ 𝑓 2 1 /2𝑓 2 .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Unseen heterogeneity</head><p>For theoretical reasons, it is important to stress that Chao1 only estimates a lower bound on the true 𝑓 0 , i.e. it estimates the minimum number of unobserved criminals; 𝑓 0 , in reality, could in fact could have been larger than f0 . (Readers should take time to convince themselves of the fact that, conversely, the detection ratio 𝑆/( f0 + 𝑆) is an upper bound for that reason.) The fact that Chao1 only offers a lower bound is related to the fact that it does not take into account any heterogeneity in the data <ref type="bibr" target="#b3">[4]</ref>: it is derived from the standard Poisson distribution, which assumes homogeneity across the data points. And yet, it is clear that some individuals in our data might have had higher detection rates, just like some species in ecology might be easier to observe in nature (because of their large size, bright color, loud vocalization, etc.) <ref type="bibr" target="#b7">[8]</ref>. It has been shown that ignoring the heterogeneity in a dataset, if any were present, will inevitably lead to (lower bound) estimates for the true population size that are excessively conservative <ref type="bibr" target="#b4">[5]</ref>. Conversely, if we can account for potential differences in detection probability, this conservative bias in Chao1 can be reduced, and the population estimate is adjusted upwards, consequently becoming less of a lower bound. To test whether there is indeed any such heterogeneity in a dataset, Böhning and colleagues <ref type="bibr" target="#b1">[2,</ref><ref type="bibr" target="#b12">13]</ref> recommend the use of a so-called ratio plot for the function:</p><formula xml:id="formula_0">r 𝑥 = (𝑥 + 1)𝑓 𝑥+1 𝑓 𝑥 ,<label>(1)</label></formula><p>where 𝑓 𝑥 refers to the number of items that occur exactly 𝑥 times <ref type="bibr" target="#b1">[2]</ref>. If the data points are homogeneous and do not violate the Poisson assumptions, the resulting points should present as a constant, i.e. a horizontal line. If that is not the case, there is reason to assume that there is unobserved heterogeneity. In Fig. <ref type="figure" target="#fig_4">4</ref>, we present a ratio plot for the arrest data up to 𝑥 = 6 from the Amigo prison. It is clear that we are not dealing with a straight horizontal line. Thus, we have reason to believe that there is heterogeneity in the data that is not accounted for by the standard Chao1.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Generalized Chao</head><p>In this section, we discuss the Generalized Chao: a generalization of the Chao1 method that can take into account statistical covariates and thus model heterogeneity across individual perpetrators in a dataset. This estimator, originally proposed by Böhning et al., aims to model heterogeneity in the data by modeling the detection probabilities as a regression problem. The method is characterized by a similar focus on low-frequency species (𝑓 1 and 𝑓 2 ) as Chao1, capturing the intuition that such uncommon species carry the most information about species which were not observed at all. These data consisting of 𝑓 1 and 𝑓 2 counts can be said to arise from a Poisson distribution, 𝑦 𝑖 ∼ Poisson(𝜆), but a truncated one with 𝑦 𝑖 ∈ 1, 2.</p><p>Böhning and colleagues show that by maximizing the binomial likelihood, we can obtain an estimate for 𝜆 𝑖 using λ 𝑖 = 2 p 𝑖 (1− p 𝑖 ) <ref type="bibr" target="#b4">[5]</ref>. Here, p 𝑖 refers to the estimated probability of species 𝑖 to occur once or twice. To estimate p , then, we fit a Generalized Linear Model assuming a binomial distribution:</p><formula xml:id="formula_1">𝑦 𝑖 ∼ Binomial(1, 𝑝 𝑖 ) logit(𝑝 𝑖 ) = 𝛼 + 𝛽 𝑥 𝑥 𝑖 + …</formula><p>Here, 𝛼 represents the intercept and 𝛽 𝑥 represents the coefÏcient for a predictor 𝑥 𝑖 which is available for the 𝑖-th species in the data; the outcome variable is binary with the negative class representing species that occur once and the positive class those that occur twice. Given p 𝑖 , the lower bound on the true population size can be estimated as follows:</p><formula xml:id="formula_2">Ŝ = 𝑆 + 𝑓 1 +𝑓 2 ∑ 𝑖=1 1 λ 𝑖 + λ 2 𝑖 /2<label>(2)</label></formula><p>We employ Bayesian logistic regression models as implemented in the Python package Bambi <ref type="bibr" target="#b5">[6]</ref> as an interface to PyMC <ref type="bibr" target="#b0">[1]</ref>. The 'No U-Turn Sampler' (NUTS) was used for sampling <ref type="bibr" target="#b13">[14]</ref>, also known as the adaptive Hamiltonian Monte Carlo (HMC) algorithm. We use weakly informative priors for all model terms. The convergence of all models was verified through examination of their trace plots and the R statistic <ref type="bibr" target="#b10">[11]</ref>. As all R values were well below the 1.1 threshold, indicating convergence, we opted not to report individual values. To assess and compare the performance of the regression models on unseen data, we employ approximate Leave-One-Out Cross Validation (LOO-CV) <ref type="bibr" target="#b21">[22]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.">Estimating the probability of repeated occurrence</head><p>We fit a series of models for diverse (additive) combinations of the available covariates. No interactions or random effects are considered. We include linear and quadratic terms for the age predictor. We compare all models against an intercept-only model which, as it ignores any heterogeneity in the data, should produce estimates equal to the Chao1 model, and can thus be considered a baseline in these experiments. We compare the predictive performance of the models using LOO-CV <ref type="bibr" target="#b21">[22]</ref>, and visualize the expected log pointwise predictive density (ELPD) and error in the left panel of Fig. <ref type="figure" target="#fig_5">5</ref>. The model ranking suggests that the inclusion of predictors generally improves the predictive performance. Akaike model weights are reported in the middle panel: these in turn suggest that the more complex, additive models offer added value over simpler models, but caution is warranted as these weights are determined as point estimates and do not account for uncertainty <ref type="bibr" target="#b19">[20]</ref>. In fact, given the considerable overlap in the errors of the ELPD estimates, no single model clearly outperforms the others and should be unequivocally preferred.</p><p>In Fig. <ref type="figure" target="#fig_6">6</ref>, we show a forest plot for the coefÏcients included in the highest ranked model, which is also the most complex in terms of parameters. The predictor coefÏcients' HDI values do not intersect with zero suggesting that they each contribute meaningfully to the model. Given the strong class imbalance, however (cf. the intercept estimate), their effect sizes are however small, which is both reflected in the non-distinctness of the model comparison above and the marginal effect plots in Fig. <ref type="figure">7</ref>. The coefÏcients show how this particular model assesses the likelihood of an individual being a doubleton (as opposed to a singleton), i.e. 𝑝(𝑓 2 ). First of all, the arrest rates vary with gender: we see that female individuals were more likely to be a doubleton than men. There appears to be a non-linear relationship with "age", which peaks around the age of 50. Doubletons are also slightly common in the local population from Brussels, who suffered higher arrest rates than people born elsewhere in Belgium -the vulnerability to arrest of native Belgians appears comparable to individuals born outside of the country. People who previously stayed the night in the Amigo at their own request also saw a higher doubleton incidence than others. Finally, this model finds weak evidence for a bias against individuals with a known last name, potentially indicating more police effort against members from families perceived as "troublemakers".</p><p>Given the lack of decisive arguments to conclusively prefer one model over the rest, we opt for the common practice of (weighted) model averaging. This approach allows us to account for model uncertainty by combining the predictions of multiple models, rather than relying solely on a single 'best' model. Specifically, we employ the stacking of predictive distributions  method <ref type="bibr" target="#b24">[25]</ref>, as implemented in PyMC. This technique generates a meta or ensemble model by computing a weighted average of the posterior predictions from all considered models. The weights assigned to each model are derived from the model weights in our previous model comparison analysis (cf. the middle panel of Fig. <ref type="figure" target="#fig_5">5</ref>).</p><p>Figure <ref type="figure">7</ref>: Marginal effect plots contrasting the predictors 'sex', 'age', and 'origin'. Shown are the 50% and 89% credible intervals. The top row displays the marginal effect of 'sex', with females showing a higher effect than males. The middle row contrasts 'age' effects, revealing a non-linear relationship peaking around age 50, with separate curves for males and females. The bottom row illustrates the interaction between 'sex', 'age', and 'origin', where 'BXL' (Brussels), 'BE' (Belgium), and 'ABROAD' categories are compared.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.4.">Adjusted population estimates</head><p>The main results are presented in in Table <ref type="table" target="#tab_1">2</ref>, which is primarily insightful for the categorical predictors. Here, the adjusted population size estimates are shown with 89% upper and lower credible intervals, both for the highest ranked model and the ensemble model, together with the observed counts and the original Chao1 estimate for each predictor level. Detection rates are presented for each predictor. Asterisks in the table indicate that the lower end of the credible interval for a level does not include the corresponding Chao1 estimate: this is of particular interest, because these are instances where the novel estimators deviate meaningfully from the baseline estimator, identifying demographic categories where the detection rate is potentially underestimated by Chao1. Crucially, we see that the Generalized Chao estimates consistently yield an upwards adjustment for the total population in comparison to Chao1: as the estimated population total went up from 16,937 to 18,414, our estimated detection rate dwindled down from 33%, to 31%. While the adjustments can be limited in some instances, this suggests that the Generalized Chao has successfully managed to capture at least some of the heterogeneity across the demographic group in the data. As expected, the model averaging yields slightly lower estimates than the single-model approach and presents a more cautious approach that might be preferable in the case of ranking ties between models, like here. When we compare the detection rates across the levels of specific factors, interesting trends can be discerned. These insights are further supported by the marginal effects plots in Fig. <ref type="figure">7</ref>. The estimated number of female perpetrators is much lower than for male perpetrators, but we see that women's vulnerability to arrest was considerably higher. In the case of migratory status, we see that local citizens form the largest group, but this time we see that the arrest efforts clearly targeted this group more intensely than people born outside of the city or country. Again, no bias against migrants is manifest in the data. A factor that considerably increased detect rates was the stay of a prior night in the Amigo. Somewhat weaker is the effect of belonging to a family of perceived troublemakers: again, a smaller demographic group, but with much more policing effort geared against them. Often, we see that the effects are strongest for the smallest, most underrepresented subgroups in the criminalized population, such as women, which are heavily outnumbered by men. This might help explain why even the best performing model in the evaluation in the end does not clearly outperform the baseline model.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Discussion and model criticism</head><p>Our application of the Generalized Chao method to the Amigo prison register data has yielded valuable insights into the 'dark number' of criminality in late 19th-century Brussels. By in-corporating demographic covariates into our model, we've been able to estimate not only the total number of unobserved perpetrators but also to characterize patterns of arrest vulnerability across different social groups. Our finding suggests that the true criminal population was likely much larger than can be observed in the register data, and that the Generalized Chao, albeit slightly, adjust the population estimate relative to the Chao1 model.</p><p>However, it's crucial to acknowledge the limitations of our approach. Our statistical models primarily focus on relatively stable individual characteristics and do not incorporate eventspecific information such as the date, location, or reason for arrest. The inclusion of age as a predictor, while informative, is only justified by the narrow one-year time frame of our dataset. In ecological terms, our model considers species-level covariates but lacks observation-level predictors. This constraint represents a significant limitation of the method and highlights an area for future methodological development.</p><p>Despite these limitations, our rigorous model comparison revealed valuable insights. Distinguishing between singletons and doubletons proved to be a complex task, with considerable model uncertainty. While our models incorporating demographic covariates didn't markedly outperform the baseline model in terms of out-of-sample predictive ability, this finding itself is valuable. It underscores the complexity of historical criminality and arrest patterns, suggesting that simple demographic factors alone may not fully explain arrest vulnerability.</p><p>While our model comparison revealed limitations in predictive power, it's crucial to consider additional factors that may influence our results. One such factor is the potential violation of core assumptions underlying the Chao1 estimator and its generalizations, particularly the assumption of a closed population during the observation period. In many applications, including ecology, this assumption is often violated due to births and deaths within the population. However, in our historical urban context, migration presents a more significant concern. Late 19th-century Brussels experienced extreme population turnover, with historians estimating annual rates as high as 10% <ref type="bibr" target="#b8">[9]</ref>. This dynamic population flux challenges the closed population assumption inherent in our models. Furthermore, the specific nature of our data introduces additional complexities. Some arrested vagrants were temporarily sent to vagrancy colonies outside Brussels. While many of these individuals eventually returned to the city, they were temporarily unobservable <ref type="bibr" target="#b8">[9]</ref>. This periodic absence of individuals from the observable population further complicates our modeling efforts.</p><p>These factors likely contributed to one-inflation in our data, an excess of singletons (individuals observed only once) compared to doubletons (individuals observed twice). This phenomenon requires careful consideration, as standard models, including non-parametric lower bound estimators, tend to overestimate the true population size in such cases <ref type="bibr" target="#b2">[3]</ref>. Several mechanisms can contribute to one-inflation in a population:</p><p>1. Behavioral change: Identified individuals may alter their behavior after initial detection. In our context, the experience of incarceration might have had a strong dissuasive effect, reducing the likelihood of repeat offenses and contributing to one-inflation; 2. Mismatching: Individuals actually detected twice might be incorrectly recorded as single detections, artificially inflating the singleton category. Given the nature of our historical data, such mismatches are plausible. For instance, an intoxicated arrestee with impaired speech might have been difÏcult for police to accurately identify and record.</p><p>Population turnover: As discussed earlier, the high rate of population flux in late 19thcentury Brussels could contribute to one-inflation by introducing many individuals who were only briefly present in the city.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Temporary absences:</head><p>The practice of sending some arrested vagrants to colonies outside Brussels, as previously mentioned, could also contribute to one-inflation by interrupting the observability of these individuals.</p><p>The presence of one-inflation in our data may partially explain the limited improvement in predictive power we observed when incorporating demographic covariates into our models. It also underscores the importance of developing and applying methods that can account for such data characteristics in historical criminology research.</p><p>Looking beyond the specific challenges of our study, this work has significant implications for the broader field of computational humanities, particularly from the perspective of critical data studies. A persistent challenge in historical scholarship is the under-representation of marginalized groups in archival records. This systemic bias not only skews our understanding of historical societies but also perpetuates the silencing of under-represented voices. Our work with the Generalized Chao estimator offers a concrete, quantitative approach to addressing this issue. By estimating the 'dark numbers' and characterizing demographic patterns in arrest vulnerability, we provide a method for: (a) quantifying the extent of under-representation in historical records, (b) identifying specific demographic groups that may be disproportionately under-represented, and (c) adjusting historical narratives to account for these 'invisible' populations.</p><p>While our study focused on historical criminology, the methodology has potential applications across various domains of historical research. It could be adapted to estimate underrepresentation in census data, literary corpora, or other historical datasets where certain groups may be systematically excluded or under-reported. Clearly, statistical methods alone cannot fully rectify historical biases. They must be used in conjunction with critical historical analysis, interdisciplinary collaboration, and a commitment to amplifying marginalized voices. Future research should explore how unseen species models can be integrated with other quantitative and qualitative methods in the humanities to provide a more comprehensive and equitable view of historical societies.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Photographic reproduction of two representative pages in the Amigo register of 1880, City Archives Brussels (SAB), fol. 95v-96r.</figDesc><graphic coords="4,89.28,84.12,416.72,312.54" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>𝑓</head><label></label><figDesc></figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Visualization of the moments in time (red vertical stripes) when three representative individuals were arrested throughout the chronological range covered by the data. (Arrests of other individuals are shown as non-distinct gray vertical lines.) Note that these events are bursty and often cluster locally in time, indicating short time lapses between two consecutive arrests of the same perpetrator.</figDesc><graphic coords="5,89.28,178.37,416.72,327.42" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Predictive plot from a Poisson model (94% HDI) for recidivists (individuals with ≥ 2 arrests), showing how timelapse (i.e. the number of days after the previous arrest of an individual) varies as a function of the arrest occasion index (i.e. the number times an individual has been previously arrested).</figDesc><graphic coords="6,151.79,84.18,291.70,218.77" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: A ratio plot for the count statistics from the arrest data.</figDesc><graphic coords="9,141.37,84.18,312.54,156.27" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_5"><head>Figure 5 :</head><label>5</label><figDesc>Figure 5: Left panel: Visualization of the model comparison for the LOO information criterion (ELPD = expected log pointwise predictive density), including the difference with the baseline intercept model (𝑦 1). The intervals for the out-of-sample predictive fit overlap across nearly all models. Middle panel: Akaike weights for each model. Right panel: Adjusted population size estimate using the Generalized Chao. Only for the best performing additive models, we see that the lower end of the credible intervals does not overlap with the point estimate for Chao1.</figDesc><graphic coords="11,89.28,84.17,416.72,185.48" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_6"><head>Figure 6 :</head><label>6</label><figDesc>Figure 6: Forest plot visualization for the model coefficients of the best performing individual model, excluding the intercept (with 89% HDIs). None of the shown ranges intersect with zero, although many of the effects are relatively small. (Origin[B] refers to 'BE' in relation to the reference level 'BXL' and Origin[C] to 'ABROAD'.)</figDesc><graphic coords="11,153.63,368.56,288.02,144.01" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0"><head></head><label></label><figDesc></figDesc><graphic coords="12,130.95,84.18,333.37,331.35" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2</head><label>2</label><figDesc>Tabular overview of the Generalized Chao estimates (best performing model ( Ŝ , and model averaging Ŝ MA ) across the different demographic levels in the data, including per-level observed counts and the original Chao1. Asterisks in the table indicate that the lower end of the credible intervals for a level does not include the corresponding Chao1 estimate.</figDesc><table><row><cell></cell><cell></cell><cell>Ŝ 𝑙</cell><cell>Ŝ 𝑢</cell><cell>𝑆</cell><cell>Ŝ</cell><cell>Ŝ MA</cell><cell>Chao1 ratio</cell></row><row><cell>Pop. total</cell><cell></cell><cell cols="5">17409 19505 5610 18414  *  18238</cell><cell>16937 0.305</cell></row><row><cell>origin</cell><cell>BXL BE ABROAD</cell><cell>7160 6940 2759</cell><cell cols="3">8202 2582 7660  *  8154 2275 7530 3783 753 3223</cell><cell>7690  *  7457 3093</cell><cell>7148 0.337 7029 0.302 2982 0.234</cell></row><row><cell>sex</cell><cell>male female</cell><cell cols="5">15042 16993 4649 15969  *  15779 2193 2730 961 2444 2460</cell><cell>14818 0.291 2272 0.393</cell></row><row><cell>prior</cell><cell>no yes</cell><cell cols="5">16364 18458 5100 17363  *  17155 942 1174 510 1050 1084</cell><cell>16299 0.294 1007 0.486</cell></row><row><cell>family</cell><cell>no yes</cell><cell cols="5">10708 12407 3237 11536  *  11312 6403 7401 2373 6877 6927</cell><cell>10688 0.281 6409 0.345</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">Reference to anonymized earlier conference presentation on which this paper is based.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">On social media, the same visual representation of an airplane (with bullet holes in specific areas) tends to be reproduced, but this is not a historic image: the origin of the image (the first version was created by around 2005) is traced in a blog: https://web.archive.org/web/20240430093343/https://cameronmoll.com/journal/abraham-wal d-red-bullet-holes-origin-story. To the best of our knowledge, no published sketches by Wald himself are known. See e.g.[23, p. 58-60]  but note that the illustration on p. 60 is also not historical.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">City Archives Brussels (SAB), Records of the Brussels Municipal Police, Amigo, register "Vagabonds et Mendiants" and register "Ivrognes, logés, prostituées", 1880 (no series or piece number).</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_3">The historical building of the Amigo prison in Brussels still exists today and currently houses a luxury hotel, centrally located, adjacent to the Grand-Place of Brussels. The name is a corruption of the older Dutch word Vrunte, which meant 'place of detention', misinterpreted during the Spanish occupation as 'friend' (Dutch vriend), hence the Spanish 'amigo'.</note>
		</body>
		<back>
			<div type="annex">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Online Resources</head><p>The full code and data for replicating our analysis will be made available from the following GitHub repository upon the publication of the paper: https://doi.org/10.5281/zenodo.13969373.</p></div>			</div>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">PyMC: a modern, and comprehensive probabilistic programming framework in Python</title>
		<author>
			<persName><forename type="first">O</forename><surname>Abril-Pla</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Andreani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Carroll</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Dong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Fonnesbeck</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Kochurov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Kumar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Lao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Luhmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Martin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Osthege</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Vieira</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Wiecki</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Zinkov</surname></persName>
		</author>
		<idno type="DOI">10.7717/peerj-cs.1516</idno>
		<ptr target="https://doi.org/10.7717/peerj-cs.1516" />
	</analytic>
	<monogr>
		<title level="j">PeerJ Computer Science</title>
		<imprint>
			<biblScope unit="volume">9</biblScope>
			<biblScope unit="page">e1516</biblScope>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Use of the ratio plot in capture-recapture estimation</title>
		<author>
			<persName><forename type="first">D</forename><surname>Boehning</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">F</forename><surname>Baksh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Lerdsuwansri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Gallagher</surname></persName>
		</author>
		<idno type="DOI">10.1080/10618600.2011.647174</idno>
		<ptr target="https://centaur.reading.ac.uk/24755/" />
	</analytic>
	<monogr>
		<title level="j">Journal of Computational and Graphical Statistics</title>
		<imprint>
			<biblScope unit="volume">22</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="135" to="155" />
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">One-Inflation and Zero-Truncation Count Data Modelling Revisited With a View on Horvitz-Thompson Estimation of Population Size</title>
		<author>
			<persName><forename type="first">D</forename><surname>Böhning</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Friedl</surname></persName>
		</author>
		<idno type="DOI">10.1111/insr.12570</idno>
		<idno>doi:</idno>
		<ptr target="https://doi.org/10.1111/insr.12570" />
	</analytic>
	<monogr>
		<title level="m">International Statistical Review</title>
				<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">A Covariate Adjustment for Zero-Truncated Approaches to Estimating the Size of Hidden and Elusive Populations</title>
		<author>
			<persName><forename type="first">D</forename><surname>Böhning</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">G M</forename><surname>Van Der Heijden</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">The Annals of Applied Statistics</title>
		<imprint>
			<biblScope unit="volume">3</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="595" to="610" />
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">A Generalization of Chao&apos;s Estimator for Covariate Information</title>
		<author>
			<persName><forename type="first">D</forename><surname>Böhning</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Vidal-Diez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Lerdsuwansri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Viwatwongkasem</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Arnold</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Biometrics</title>
		<imprint>
			<biblScope unit="volume">69</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="1033" to="1042" />
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Bambi: A Simple Interface for Fitting Bayesian Linear Models in Python</title>
		<author>
			<persName><forename type="first">T</forename><surname>Capretto</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Piho</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Kumar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Westfall</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Yarkoni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><forename type="middle">A</forename><surname>Martin</surname></persName>
		</author>
		<idno type="DOI">10.18637/jss.v103.i15</idno>
	</analytic>
	<monogr>
		<title level="j">Journal of Statistical Software</title>
		<imprint>
			<biblScope unit="volume">103</biblScope>
			<biblScope unit="issue">15</biblScope>
			<biblScope unit="page" from="1" to="29" />
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Nonparametric estimation of the number of classes in a population</title>
		<author>
			<persName><forename type="first">A</forename><surname>Chao</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Scandinavian Journal of Statistics</title>
		<imprint>
			<biblScope unit="volume">11</biblScope>
			<biblScope unit="page" from="265" to="270" />
			<date type="published" when="1984">1984</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Estimation of species richness and shared species richness</title>
		<author>
			<persName><forename type="first">A</forename><surname>Chao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">H</forename><surname>Chiu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Methods and Applications of Statistics in the Atmospheric and Earth Sciences</title>
				<editor>
			<persName><forename type="first">N</forename><surname>Balakrishnan</surname></persName>
		</editor>
		<imprint>
			<publisher>Wiley</publisher>
			<date type="published" when="2012">2012</date>
			<biblScope unit="page" from="76" to="111" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Removing Local Nuisances, Arresting Masterless Strangers, and Granting &apos;Nights on Request&apos; : The Policing of Vagrancy in Late-Nineteenth-Century Antwerp and Brussels</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">De</forename><surname>Koster</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Erkul</surname></persName>
		</author>
		<idno type="DOI">10.52024/tseg.13642</idno>
	</analytic>
	<monogr>
		<title level="j">TSEG -The Low Countries Journal of Social and Economic History</title>
		<imprint>
			<biblScope unit="volume">20</biblScope>
			<biblScope unit="issue">1</biblScope>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">ÜBerlieferungs-chance Und ÜBerlieferungs-zufall Als Methodisches Problem Des Historikers</title>
		<author>
			<persName><forename type="first">A</forename><surname>Esch</surname></persName>
		</author>
		<idno type="DOI">10.1524/hzhz.1985.240.3.529</idno>
	</analytic>
	<monogr>
		<title level="j">Historische Zeitschrift</title>
		<imprint>
			<biblScope unit="volume">240</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="529" to="570" />
			<date type="published" when="1985">1985</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Inference from Iterative Simulation Using Multiple Sequences</title>
		<author>
			<persName><forename type="first">A</forename><surname>Gelman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">B</forename><surname>Rubin</surname></persName>
		</author>
		<idno type="DOI">10.1214/ss/1177011136</idno>
	</analytic>
	<monogr>
		<title level="j">Statistical Science</title>
		<imprint>
			<biblScope unit="volume">7</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="457" to="472" />
			<date type="published" when="1992">1992</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Estimating species richness</title>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">J</forename><surname>Gotelli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">K</forename><surname>Colwell</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Biological Diversity: Frontiers in Measurement and Assessment</title>
				<editor>
			<persName><forename type="first">A</forename><forename type="middle">E</forename><surname>Magurran</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">B</forename><forename type="middle">J</forename><surname>Mcgill</surname></persName>
		</editor>
		<imprint>
			<publisher>Oxford University Press</publisher>
			<date type="published" when="2011">2011</date>
			<biblScope unit="page" from="39" to="54" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Capture Recapture to Estimate Criminal Populations</title>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">G M</forename><surname>Van Der Heijden</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Cruyff</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Böhning</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-1-4614-5690-2\_662</idno>
	</analytic>
	<monogr>
		<title level="m">Encyclopedia of Criminology and Criminal Justice</title>
				<meeting><address><addrLine>New York, NY; New York</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="267" to="276" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">D</forename><surname>Hoffman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Gelman</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Machine Learning Research</title>
		<imprint>
			<biblScope unit="volume">15</biblScope>
			<biblScope unit="page" from="1593" to="1623" />
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Forgotten books: The application of unseen species models to the survival of culture</title>
		<author>
			<persName><forename type="first">M</forename><surname>Kestemont</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Karsdorp</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>De Bruijn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Driscoll</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">A</forename><surname>Kapitan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">O</forename><surname>Macháin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Sawyer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Sleiderink</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Chao</surname></persName>
		</author>
		<idno type="DOI">10.1126/science.abl7655</idno>
	</analytic>
	<monogr>
		<title level="j">Science</title>
		<imprint>
			<biblScope unit="volume">375</biblScope>
			<biblScope unit="page" from="765" to="769" />
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Off the Record: On Studying Lost Arabic Books and their Networks</title>
		<author>
			<persName><forename type="first">N</forename><surname>Löhr</surname></persName>
		</author>
		<idno type="DOI">10.1553/medievalworlds\_no18\_2023s219</idno>
	</analytic>
	<monogr>
		<title level="j">Medieval Worlds</title>
		<imprint>
			<biblScope unit="volume">18</biblScope>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Abraham Wald&apos;s Work on Aircraft Survivability</title>
		<author>
			<persName><forename type="first">M</forename><surname>Mangel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">J</forename><surname>Samaniego</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of the American Statistical Association</title>
		<imprint>
			<biblScope unit="volume">79</biblScope>
			<biblScope unit="page" from="259" to="267" />
			<date type="published" when="1984">1984</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Unread, yet preserved: A case study on survival of the 19th-century printed poetry</title>
		<author>
			<persName><forename type="first">A</forename><surname>Martynenko</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Literatura: teorıá, historia, crıtica</title>
		<imprint>
			<biblScope unit="volume">25</biblScope>
			<biblScope unit="issue">2</biblScope>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Comparing Self-Reports and OfÏcial Records of Arrests</title>
		<author>
			<persName><forename type="first">M</forename><surname>Maxfield</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Weiler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Widom</surname></persName>
		</author>
		<idno type="DOI">10.1023/a:1007577512038</idno>
	</analytic>
	<monogr>
		<title level="j">Journal of Quantitative Criminology</title>
		<imprint>
			<biblScope unit="volume">16</biblScope>
			<biblScope unit="page" from="87" to="110" />
			<date type="published" when="2000">2000</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<monogr>
		<title level="m" type="main">Statistical Rethinking: A Bayesian Course with Examples in R and Stan</title>
		<author>
			<persName><forename type="first">R</forename><surname>Mcelreath</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2020">2020</date>
			<publisher>CRC Press</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Dark Figure of Crime (Problems of Estimation)</title>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">L</forename><surname>Penney</surname></persName>
		</author>
		<idno type="DOI">10.1002/9781118517383.wbeccj248</idno>
	</analytic>
	<monogr>
		<title level="m">The Encyclopedia of Criminology and Criminal Justice</title>
				<imprint>
			<publisher>John Wiley &amp; Sons, Ltd</publisher>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="1" to="6" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Practical Bayesian Model Evaluation Using Leave-One-out Cross-Validation and WAIC</title>
		<author>
			<persName><forename type="first">A</forename><surname>Vehtari</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Gelman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Gabry</surname></persName>
		</author>
		<idno type="DOI">10.1007/s11222-016-9696-4</idno>
	</analytic>
	<monogr>
		<title level="j">Statistics and Computing</title>
		<imprint>
			<biblScope unit="volume">27</biblScope>
			<biblScope unit="issue">5</biblScope>
			<biblScope unit="page" from="1413" to="1432" />
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<monogr>
		<title level="m" type="main">Visual revelations: graphical tales of fate and deception from Napoleon Bonaparte to Ross Perot</title>
		<author>
			<persName><forename type="first">H</forename><surname>Wainer</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2000">2000</date>
			<publisher>Lawrence Erlbaum Associates</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">What Shall We Do With the Unseen Sailor? Estimating the Size of the Dutch East India Company Using an Unseen Species Model</title>
		<author>
			<persName><forename type="first">M</forename><surname>Wevers</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Karsdorp</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Van Lottum</surname></persName>
		</author>
		<ptr target="https://ceur-ws.org/Vol-3290/short%5C%5Fpaper1793.pdf" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Computational Humanities Research Conference 2022, CHR 2022</title>
				<editor>
			<persName><forename type="first">F</forename><surname>Karsdorp</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">K</forename><forename type="middle">L</forename><surname>Nielbo</surname></persName>
		</editor>
		<meeting>the Computational Humanities Research Conference 2022, CHR 2022<address><addrLine>Antwerp, Belgium</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2022">December 12-14, 2022. 2022</date>
			<biblScope unit="volume">3290</biblScope>
			<biblScope unit="page" from="189" to="197" />
		</imprint>
	</monogr>
	<note>CEUR Workshop Proceedings. CEUR-WS</note>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">Using Stacking to Average Bayesian Predictive Distributions (with Discussion)</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Yao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Vehtari</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Simpson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Gelman</surname></persName>
		</author>
		<idno type="DOI">10.1214/17-ba1091</idno>
	</analytic>
	<monogr>
		<title level="j">Bayesian Analysis</title>
		<imprint>
			<biblScope unit="volume">13</biblScope>
			<biblScope unit="issue">3</biblScope>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
