=Paper=
{{Paper
|id=Vol-2695/paper6
|storemode=property
|title=Linking Dutch Civil Certificates
|pdfUrl=https://ceur-ws.org/Vol-2695/paper6.pdf
|volume=Vol-2695
|authors=Joe Raad,Rick Mourits,Auke Rijpma,Ruben Schalk,Richard Zijdeman,Albert Meroño-Peñuela
|dblpUrl=https://dblp.org/rec/conf/esws/RaadMRSZM20
}}
==Linking Dutch Civil Certificates==
Linking Dutch Civil Certificates
Joe Raad1 , Rick Mourits2 , Auke Rijpma2 , Ruben Schalk2 , Richard
Zijdeman3,4 , Kees Mandemakers3 , and Albert Meroño-Peñuela1,3
1
Vrije Universiteit Amsterdam, NL
2
Utrecht University, Utrecht, NL
3
International Institute of Social History, Amsterdam, NL
4
University of Stirling, Stirling, Scotland, UK
Abstract. Finding and linking different appearances of the same entity
in an open Web setting is one of the primary challenges of the Seman-
tic Web. In social and economic history, record linkage has dealt with
this problem for a long time, linking historical individual records at a
local database level. With the advent of semantic technologies, Knowl-
edge Graphs containing these records have been published, raising the
need for large-scale linking techniques that consider the particularities
of historical individual linking. In this paper we focus on our current
investigation of such techniques to link the Dutch civil certificates in the
LINKS/CLARIAH project. We describe the production of the LINKS
Knowledge Graph, and we show its potential at answering domain re-
search questions through its large number of owl:sameAs links. 5
Keywords: linked data, digital humanities, civil certificates linking
1 Introduction
Finding and linking equivalent entities (persons, places, events, concepts) on the
Web is one of the most important challenges of a Semantic Web of Linked Data.
The distributed data publishing paradigm and the scale of the Web exacerbate
this problem; various approaches have been proposed to address it, including
heuristic-based linking (e.g. string similarity) [12], cluster-similarity linking [11],
and deep learning-based knowledge graph completion [14]. The goal is to produce
identity links that use the owl:sameAs or skos:exactMatch predicates so data
consumers are aware of identity clusters and classes [3].
Interestingly, the problem has been dealt with in other fields of research; in
particular in economic and social history. There, record linkage is a challenging
and active area of research, as shown in a recent Historical Methods special is-
sue on the subject [19]; and is becoming ever more important in economic and
social history. Mass digitisation of archival material means that further insight
can be obtained by linking individuals and households across different records,
especially now that sources with complete population coverage are becoming
5
Copyright ©2020 for this paper by its authors. Use permitted under Creative Com-
mons License Attribution 4.0 International (CC BY 4.0).
48 Raad et al.
available. Historical civil certificates are the authoritative sources of birth, mar-
riage and death events in municipality registers, and allow for the reconstruction
of lives of the past [4][23]. In the Netherlands, the LINKS project [15] has shown
that this reconstruction is, however, often very challenging, as there is generally
no ground truth. Individuals are not actively followed over time, but observed
during the registration of a vital event. As a result, it is unclear whether, where,
and when an individual can be observed. It is not even certain whether follow-up
is available at all, because individuals could migrate out of the region of observa-
tion [4]. To complicate matters further, large quantities of historical certificates
have been indexed, which gives rise to data entry errors. These spelling mistakes
can be hard to deal with, as twins and other multiple births often receive sim-
ilar names. Furthermore, first names were often reused in families to “replace”
earlier-born, deceased siblings. Finally, civil servants were known to indicate
non-standard mutations, such as name changes, acknowledgement of children,
and divorces as side notes. As a result, very important relational information is
often not standardised [23].
In this paper, we summarise our efforts in the LINKS and CLARIAH projects
to overcome these challenges, and link the appearance of the same person in 1.5
million birth (1812–1919), marriage (1812–1944) and death (1812–1969) certifi-
cates in the Dutch province of Zeeland. Specifically, our contributions are:
– A description of the LINKS knowledge graph production process by using
standard semantic technologies (Section 4)
– A highly scalable certificate linking method based on efficient string similar-
ity through Levenshtein automaton (Section 5)
– A preliminary evaluation based on SPARQL queries that use such links (Sec-
tion 6)
In the next sections, we survey related work (Section 2), describe the original
dataset (Section 3), explain our contributions (Sections 4, 5 and 6), and conclude
(Section 7).
2 Related Work
Historical record linkage generally requires a previous effort on digitising large
amounts of individual-level historical records, a goal shared by projects like NAP-
P/IPUMS [9], the Balsac Population Database6 , the Utah Population Database7 ,
Familysearch8 , the Scottish Longitudinal Study9 , Digitising Scotland10 , the Nor-
6
http://balsac.uqac.ca/english/
7
https://uofuhealth.utah.edu/huntsman/utah-population-database/
8
https://www.familysearch.org
9
https://sls.lscs.ac.uk/
10
https://digitisingscotland.ac.uk/
Linking Dutch Civil Certificates 49
way Historical Population Register11 , Link-Lives12 , POPLINK/DDB13 , the Sca-
nian Economic Demographic Database (SEDD)14 , the North Orkney Population
History Project[13] and Death and Burial Data in Ireland 1864-192215 . Linking
individuals in the US 1850 and 1860 census is generally considered one of the
earliest efforts [8], and similar approaches for Canada [2] and Sweden [25] have
followed. [17] provides a critical review of these and other historical record link-
age efforts, with a focus on US data. We share with these efforts a focus on
string-based comparison linkage. In other projects (e.g. Digitising Scotland) the
goal is to perform group-level linkage as well [1]. Recent machine learning ap-
proaches have gotten a lot of traction in the field [6]. For example, recent work on
historical US census data uses manually labelled data from familysearch.com
as training data [18]. In the Netherlands, earlier work on Dutch civil certificates
focuses on methodological aspects of record linkage [21]. The work done in the
LINKS project [15] constitutes a basis for our contribution.
3 Dataset
The digitised civil registry consists at the moment of 27.5 million certificates. In
total, there are 10.3 million birth certificates, 4.4 marriage certificates, and 12.7
million death certificates in the digitised registry at the International Institute
of Social History16 . The number of available birth and death certificates differs
strongly as, due to privacy laws, only death certificates that are more than 50
years old are available for research. Birth certificates become available with a
100-year delay, marriage certificates with a 75-year delay, and death certificates
with a 50-year delay. For the moment, the experiments in this paper are restricted
to the civil registries produced in the Zeeland region. This dataset of Zeeland
civil registries, known as LINKS Zeeland cleaned 2016 01 [16], consists of 1.5
million certificates, which represents ∼5.5% of the total certificates. Specifically,
there are 698,285 birth certificates (6.7% of the total birth certificates), 193,921
marriage certificates (4.4%), and 665,999 death certificates (5.2%). This dataset
is cleaned, standardised and distributed in a restricted manner [15] in the form
of three CSV files:
1. Locations: containing the locations that show up in the civil certificates,
describing the municipality, province, region and the country of a location.
This file consists of 6 columns and 2,456 rows.
2. Registrations: containing general data from a certificate registration which
exceed the individual level, such as the date and place of birth, marriage or
11
https://www.rhd.uit.no/nhdc/hpr.html
12
https://link-lives.dk/
13
https://www.umu.se/en/centre-for-demographic-and-ageing-research/
databases/parish-registers-databases/
14
https://www.ed.lu.se/databases/sedd
15
https://www.dbdirl.com/
16
https://iisg.amsterdam/en
50 Raad et al.
death. This file consists of 10 columns and 1,558,205 rows, with each row
representing a single registration in the Zeeland province.
3. Persons: containing all appearances of persons. In general every birth certifi-
cate generates records for three persons (newborn child, mother and father),
a marriage certificate generates minimally six person records (bride with
her parents and the groom with his parents) and a death certificate gener-
ates three or four person records (deceased, father, mother and possibly a
spouse). This file consists of 33 columns and 5,526,393 rows.
4 LINKS Knowledge Graph
The process of converting the CSV files of the LINKS dataset into a Knowledge
Graph consists of three steps. Firstly, we manually design a model for describing
and enriching the civil registries data, following Linked Data best practices.
Secondly, we transpose the CSV data into an RDF Knowledge Graph, according
to our designed model. Finally, we make the graph available for browsing and
querying in an efficient manner.
4.1 Designing the civil registries schema
For modelling the civil registries data, we designed a new simple model that
reuses, whenever possible, existing vocabularies. This model is presented in Fig-
ure 1, and has four main components:
- Civil Registrations. The first component (concepts coloured in brown) de-
scribes each civil registration (birth, marriage, or death certificate), listing
its identifier, its sequential number, the location, and date of the registration.
- Life Events. The second component (in green) describes the actual life events
(birth, marriage, or death event), listing the main individuals involved in this
event, the location and the date of this event. In this model, a distinction
is made between the civil registration and their associated life events, as
certain civil registrations can be produced in different dates and locations
from where the life event actually happened.
- Individuals. The third component (in blue) describes each individual in-
volved in these life events, listing their names, sex, civil status, and birth
dates.
- Locations. The final component (in orange) describes the location where each
life event has happened and the location where it was registered. In this
component, information regarding the municipality, the province, the region,
and the country can be available.
4.2 Transposing the data to RDF
For converting the Zeeland dataset to a Knowledge Graph, we use the tool CoW
(CSV on the Web converter)17 . This batch tool, developed within the CLARIAH
17
https://csvw-converter.readthedocs.io/
Linking Dutch Civil Certificates 51
Fig. 1: Schema of the LINKS Knowledge Graph. For a higher resolu-
tion figure, we refer the reader to https://github.com/CLARIAH/wp4-
civreg/blob/master/schema/LINKS-schema.png
project [10], allows the conversion of datasets expressed in CSV. It uses a JSON
schema expressed using an extended version of the CSVW standard, to convert
CSV files to RDF in scalable fashion. In the case of the Zeeland dataset, we run
the conversion process separately for the three CSV files, by manually designing
the JSON schema for Locations 18 , Registrations 19 , and Persons 20 . This manual
design of these JSON files allows us to transpose the data according to the
model presented in Figure 1. After designing a JSON file for each of the three
CSV files, we use the command line to convert each of these files separately. For
instance, having both the Locations.csv file with its associated JSON schema
Locations.csv-metadata.json in the same directory, the following command is
sufficient to convert the data to RDF, creating the RDF file Locations.nq encoded
in the N-Quads format.
$ cow tool convert Locations . csv
The conversion process takes 30 seconds for the file Locations, 100 minutes
for Registrations, and around 5 hours for Persons on a SSD disk, with 64GB of
memory.
18
https://raw.githubusercontent.com/CLARIAH/wp4-civreg/master/json/
locations.csv-metadata.json
19
https://github.com/CLARIAH/wp4-civreg/blob/master/json/registrations.
csv-metadata.json
20
https://github.com/CLARIAH/wp4-civreg/blob/master/json/persons.
csv-metadata.json
52 Raad et al.
4.3 Accessing the RDF knowledge graph
Combining the three resulted N-Quads files results in the LINKS knowledge
graph, composed of 58,513,388 triples. This knowledge graph can be accessed
online through Druid21 , the CLARIAH instance of the TriplyDB triple store22 .
Druid allows the storage of knowledge graphs, and provides tools to browse,
query and visualise our data. For privacy reasons, the LINKS knowledge graph
is uploaded as a private dataset on Druid, restricting its access23 to members
of the LINKS organisation24 on Druid. We provide publicly accessible links to
resources of the knowledge graph when possible.
In addition to accessing the LINKS knowledge graph through the Druid Web
hub, authorised users of the LINKS knowledge graph can also access this dataset
locally. For enabling easy and efficient access on a normal local machine, we
convert the LINKS knowledge graph from N-Quads to HDT (Header, Dictionary,
Triples) [7]. This compact data structure and binary serialisation format for RDF
keeps big datasets compressed to save space while maintaining search and browse
operations without prior decompression. Converting the LINKS knowledge graph
into HDT consists of two simple steps: (i) merge the three RDF N-Quads files
into one larger N-Quads file, (ii) convert the resulting merged file to HDT using
the rdfhdt library25 .
5 Certificate Linkage
For linking Dutch civil registries, we heavily rely on the string similarity between
individuals’ names. This is motivated by the high quality of the registered names
in most civil certificates, and the limited spelling variation between different civil
certificates for the same individual. An example of such quality maintenance can
be observed in marriage registrations, where both the bride and the groom are
required to bring their own birth certificates when registering their marriage.
Moreover, married women in the Netherlands keep their own family name in the
civil certificates, which highly facilitates the problem at hand. In the case of death
registrations, they are generally registered by next of kin —parents, spouses,
children, or siblings— which also highly limits variations in name spelling [23].
Similarity between two names can be measured in several ways, such as cal-
culating the Levenshtein, Jaccard, or Jaro-Winkler distances. In this work, we
take the Levenshtein distance as a basis for matching individuals in civil certifi-
cates. This distance measures the number of single character edits (insertions,
deletions or substitutions) required to change one name into the other. The stan-
dard algorithm for calculating the Levenshtein distance between two names was
proposed by Wagner and Fisher [24], but can lead to a quadratic time complexity.
21
https://druid.datalegend.net/
22
https://triply.cc/
23
https://druid.datalegend.net/LINKS/links-zeeland/
24
https://druid.datalegend.net/LINKS
25
http://www.rdfhdt.org/manual-of-the-java-hdt-library/
Linking Dutch Civil Certificates 53
In this work, as we aim to match individuals from a list of millions of certifi-
cates to individuals in another large list of certificates, the standard approach
(or its variants) of calculating the Levenshtein distance by comparing each pair
of certificates is not feasible, as the time complexity of the approach can grow
exponentially with the size of the given lists. Therefore, we adopt the approach
and the library proposed by Dylon Devo26 , based largely on the work of Schulz
and Mihov [22], for the fast selection of candidate individuals within a certain
Levenshtein distance. In this approach, the list of target individuals are indexed
as a Minimal Acyclic Finite-State Automata (MA-FSA), where a Levenshtein
transducer is initialised according to a maximum distance specified by the user.
When a name is given as a source query with a maximum accepted Levenshtein
distance, the states of the Levenshtein automaton corresponding to that name
are constructed on-demand as the automaton is evaluated. According to its au-
thor, this approach allows to find for a given name n all candidate names in a list
M in linear time on the length of n, and not on the size M . In the following, we
describe how we deploy this approach for matching newborns registered in birth
certificates to their marriage certificates. The general process remains unchanged
for other types of linkage, where only the roles of the considered individuals and
the link’s timeline consistency are adapted accordingly.
5.1 Approach
Finding the marriage certificate of a certain newborn, when applicable, requires
matching three individuals: (i) the newborn in the birth certificate with the bride
or groom of a certain marriage certificate, (ii) the newborn’s mother with the
bride’s or groom’s mother, (iii) the newborn’s father with the bride’s or groom’s
father. Once a match, according to a maximum Levenshtein distance, between
the three individuals of a birth certificate and a marriage certificate is found,
we check whether the logical timeline is respected. Only when a match between
two certificates based on the three individuals is found, with a correct logical
timeline, a match between the three individuals is registered in the Knowledge
Graph. Specifically, our approach for matching newborns to a marriage certificate
can be divided into 5 main steps:
1. Create six indices, with each index representing a MA-FSA containing the
list of all full names of a certain role in marriage certificates. For instance,
the index of the role ”bride” contains the full names (first name + last
name) of all women individuals that got married (i.e. role of bride). For
each of these indices, a Levenshtein transducer is initialised according to a
maximum Levenshtein distance, given by the user.
2. Create six Key-Value databases, with each database covering a single role r
in the marriage certificate. A key in a database represents a full name f n,
and the value represents a list of marriage certificate identifiers that have for
the role r an individual with the name f n. For instance, the entry “Anna
26
https://github.com/universal-automata/liblevenshtein-java/
54 Raad et al.
Aartsen” → {123323,232344} indicates that both these certificates have a
bride registered with the full name “Anna Aartsen”. While such information
can be directly queried from the Knowledge Graph, Key-Value databases
are a better mean for frequent read requests. In particular, we rely on the
RocksDB27 disk-based Key-Value database.
3. Find marriage certificate candidate(s) for each birth certificate. For this, we
firstly search for the full name of the newborn in the index of the bride or
the groom. Considering that the newborn is a girl, this step retrieves a list
of candidate names Cnewborn from the bride index, representing a spelling
variation within the maximum Levenshtein distance specified by the user. If
Cnewborn is not empty, we retrieve from the bride’s Key-Value database the
list of candidate certificates Enewborn that contain this candidate’s name.
In the case where Cnewborn contains several candidates, the result will be
the union of all returned Enewborn for each candidate. The same process
is applied when searching for the full name of the newborn’s mother and
father in the bride’s mother and father indices, respectively returning a list
of candidate certificates Emother and Ef ather .
4. Filter resulting candidates. Since in the majority of cases, a newborn is ex-
pected to have the same registered parents during marriage, we require the
match between the birth and the marriage certificates to be based on the
three individuals. Therefore, the preliminary marriage candidates consists
of the intersection of Enewborn , Emother and Ef ather . Finally, out of these
preliminary candidates only those that respect the logical timeline are con-
sidered. In this case consisting of matching a newborn to a bride or a groom,
we expect that the marriage certificate is registered at least 14 years, and at
most 70 years, after its matched birth registration.
5. Save links, in two formats for respecting the preferences of most researchers:
(a) CSV file consisting of the birth certificate identifier, the matched mar-
riage certificate identifier, with the link metadata consisting mainly of the
Levenshtein distance between each matched individual in these certificates,
and time difference between both registrations, (b) N-Quads file consist-
ing of owl:sameAs links between each matched individual, with each link
being asserted in a different named graph for describing its context. For
instance, the statement h iisg:newbornURI, owl:sameAs, iisg:brideURI,
iisg:graph/birthToMarriage/0-2-1 i indicates that the identity link be-
tween these two individuals was detected based on a Levenshtein distance of
0 between the newborn and bride’s name, a Levenshtein of 2 between their
mothers’ names, and 1 for the fathers’ names.
5.2 Experiments
For testing the scalability of our approach, we evaluated our matching approach
on the Zeeland dataset described in Section 3. We firstly evaluated the process
27
http://rocksdb.org
Linking Dutch Civil Certificates 55
NewbornToPartner PartnerParentsToCouple
Maximum
Number of Runtime Number of Runtime
Levenshtein
Links (in mins) Links (in mins)
per Individual
1 271,230 5 205,477 2
2 289,937 18 224,785 8
3 310,232 74 244,343 25
Table 1: Results of matching newborns in marriage certificates to brides/grooms
(newbornToPartner), and matching parents of brides/grooms in marriage cer-
tificates to their own marriage certificate (partnerParentsToCouple).
1 SELECT ? year ( avg (? samePlace ) as ? shareS amePlace ) WHERE {
2 GRAPH ? g {
3 ? fatherBride owl : sameAs ? f a t h e r B r i d e _ a s G r o o m .
4 ? fatherGroom owl : sameAs ? f a t h e r G r o o m _ a s G r o o m .
5 }
6 ? mar1 iisgv : fatherBride ? fatherBride ;
7 iisgv : fatherGroom ? fatherGroom ;
8 schema : location ? loc1 .
9 ? mar2 iisgv : groom ? f a t h e r B r i d e _ a s G r o o m ;
10 bio : date ? date ;
11 schema : location ? loc2 .
12 ? mar3 iisgv : groom ? f a t h e r G r o o m _ a s G r o o m ;
13 schema : location ? loc3 .
14 FILTER (? date > "1840 -01 -01"^^ xsd : date && ? date < "1910 -01 -01"^^ xsd : date )
15 BIND ( if (? loc1 = ? loc2 || ? loc1 = ? loc3 , 1 , 0) AS ? samePlace ) .
16 BIND ( year (? date ) as ? year ) .
17 }
18 GROUP BY ? year
19 ORDER BY ? year
Listing 1.1: SPARQL query for identifying migrants and non-migrants in LINKS.
of matching newborns in marriage certificates to brides/grooms in marriage cer-
tificates, and then evaluated the process of matching parents of brides/grooms in
marriage certificates to their own marriage certificate. Table 1 shows that match-
ing civil registries of a Dutch province takes no more than a few minutes28 , with
the runtime increasing as the maximum Levenshtein distance per individual in-
creases. It also shows that even with a maximum Levenshtein distance of 1, there
is a significant overlinking, since the number of detected links (271,230) is larger
than the number of marriage certificates in this dataset (193,921). Therefore,
indicating that a number of marriage certificates were matched to multiple birth
certificates. The source code of this approach is publicly available29 .
6 Preliminary Evaluation (Use Cases)
While we expect that a dataset containing information on every Dutch person
born in the period 1812–1919 and their family relations will be useful to many
28
Experiments conducted on MacBook Pro, with SSD disk and 16GB of memory
29
https://github.com/CLARIAH/wp4-links
56 Raad et al.
●
0.80
●
●
●●●
● ●
●●
0.78
●
●
●
● ●
● ●
shareSamePlace
●
0.76
●
●
●
● ●
● ● ●●
●
●
0.74
● ●●
● ●
●
●●
● ●
0.72
● ● ●
●●
●● ●
●
● ● ● ●● ●●
● ●
● ● ● ●
● ●
● ● ●
● ●
● ●
1840 1850 1860 1870 1880 1890 1900 1910
year
Fig. 2: Share of marriages of father and bride in same location, 1840–1910.
researchers, it will be especially valuable to demographic, social, and economic
historians working with individual-level data. One issue in particular that it can
address, is bias in results due to migration.
The key issue there is that, currently, many analyses are based on records
from one locality (a village, town, or province). In other words, out-migrants are
left out of the data. This is a problem because migrants are different from the rest
of the population. For example, according to Ruggles [20] they had different ages
at marriage and life expectancies. A comparison of the civil registry of Zeeland
and a smaller population register data set (HSN) that follows individuals as
they move, has also shown that the differences between such datasets can be
explained by the exclusion of migrants out of Zeeland [5].
The new data created here can take substantial steps to resolve this issue.
Only international out-migrants can now go missing, which is a far smaller share
of the data. The query30 in Listing 1.1 shows how migrants and non-migrants are
easily identified in the data. To do this, we compare the location of a marriage
with that of both the bride’s and the groom’s parents’ marriage. The results
of this query (figure 2) show that the share of non-migrants between 1840 and
1910 falls from 80 to 72 percent, which means that by the start of the twentieth
century, nearly a fourth of the couples moved between their marriage and that
of their child. Linking a civil registry for the entire Netherlands as done here
allows us to include this large group in future analyses.
30
https://github.com/CLARIAH/wp4-queries-links/blob/master/
marriage-locations.rq
Linking Dutch Civil Certificates 57
7 Conclusion
In this work, we described our production process of the LINKS Knowledge
Graph, containing civil certificates of the Dutch province of Zeeland. We pre-
sented our approach for linking all these certificates, within 5 minutes on a
regular laptop, and showed how such links can be exploited for conducting de-
mographic analyses using SPARQL. This work is in the process of being extended
to cover all certificates of the Netherlands, enabling larger and more valuable
demographic, social and economic analyses.
References
1. Akgün, , Dearle, A., Kirby, G., Garrett, E., Dalton, T., Christen, P., Dibben, C.,
Williamson, L.: Linking Scottish vital event records using family groups. Historical
Methods: A Journal of Quantitative and Interdisciplinary History 0(0), 1–17 (Mar
2019), https://doi.org/10.1080/01615440.2019.1571466
2. Antonie, L., Inwood, K., Lizotte, D.J., Andrew Ross, J.: Tracking people over time
in 19th century Canada for longitudinal analysis. Machine Learning 95(1), 129–146
(Apr 2014). https://doi.org/10.1007/s10994-013-5421-0, https://link.springer.
com/article/10.1007/s10994-013-5421-0
3. Beek, W., Raad, J., Wielemaker, J., van Harmelen, F.: sameas. cc: The closure of
500m owl: sameas statements. In: European semantic web conference. pp. 65–80.
Springer (2018)
4. Van den Berg, N., Van Dijk, I.K., Mourits, R.J., Slagboom, P.E., Janssens,
A.A.P.O., Mandemakers, K.: Families in comparison: An individual-level compar-
ison of life-course and family reconstructions between population and vital event
registers. Population Studies: A Journal of Demography pp. 1–20 (2020)
5. Berg, N.v.d., Dijk, I.K.v., Mourits, R.J., Slagboom, P.E., Janssens, A.A.P.O., Man-
demakers, K.: Families in comparison: An individual-level comparison of life-course
and family reconstructions between population and vital event registers. Popula-
tion Studies 0(0), 1–20 (Feb 2020), https://doi.org/10.1080/00324728.2020.
1718186
6. Feigenbaum, J.J.: Multiple Measures of Historical Intergenerational Mobility: Iowa
1915 to 1940. The Economic Journal 128(612), F446–F481 (Jul 2018), https:
//onlinelibrary.wiley.com/doi/abs/10.1111/ecoj.12525
7. Fernández, J.D., Martı́nez-Prieto, M.A., Gutiérrez, C., Polleres, A., Arias, M.:
Binary rdf representation for publication and exchange (hdt). Journal of Web Se-
mantics 19, 22–41 (2013)
8. Ferrie, J.P.: A New Sample of Males Linked from the Public Use Microdata Sample
of the 1850 U.S. Federal Census of Population to the 1860 U.S. Federal Cen-
sus Manuscript Schedules. Historical Methods: A Journal of Quantitative and
Interdisciplinary History 29(4), 141–156 (Oct 1996), https://doi.org/10.1080/
01615440.1996.10112735
9. Goeken, R., Huynh, L., Lynch, T.A., Vick, R.: New Methods of Census Record
Linking. Historical Methods: A Journal of Quantitative and Interdisciplinary His-
tory 44(1), 7–14 (Jan 2011), https://doi.org/10.1080/01615440.2010.517152
10. Hoekstra, R., Meroño-Peñuela, A., Rijpma, A., Zijdeman, R., Ashkpour, A.,
Dentler, K., Zandhuis, I., Rietveld, L.: The datalegend ecosystem for historical
statistics. Journal of Web Semantics 50, 49–61 (2018)
58 Raad et al.
11. Idrissou, A.K., Hoekstra, R., Van Harmelen, F., Khalili, A., Van den Besselaar,
P.: Is my: sameas the same as your: sameas? lenticular lenses for context-specific
identity. In: Proceedings of the Knowledge Capture Conference. pp. 1–8 (2017)
12. Isele, R., Jentzsch, A., Bizer, C.: Silk server-adding missing links while consuming
linked data. In: Proceedings of the First International Conference on Consuming
Linked Data-Volume 665. pp. 85–96. CEUR-WS. org (2010)
13. Jenning, J.A., Sparks, C.A., Murtha, T.: Interdisciplinary approach to spatiotem-
poral population dynamics:the north orkney population history project. His-
torical Life Course Studies pp. 27–51 (2019), http://hdl.handle.net/10622/
23526343-2019-0002?locatt=view:master
14. Lisena, P., Meroño-Peñuela, A., Troncy, R.: MIDI2vec: Learning MIDI Embed-
dings for Reliable Prediction of Symbolic Music Metadata. Transactions of the
International Society for Music Information Retrieval (2020), under review
15. Mandemakers, K., Laan, F.: LINKS dataset Genes Germs and Resources,
WieWasWie Zeeland, Civil Certificates, version 2017.01. International Institute
of Social History, Amsterdam
16. Mandemakers, K., Laan, F.: LINKS-Zeeland challenge, WieWasWie Zeeland, Civil
Certificates, version 2016. International Institute of Social History, Amsterdam
17. Massey, C.G.: Playing with matches: An assessment of accuracy in linked historical
data. Historical Methods: A Journal of Quantitative and Interdisciplinary History
0(0), 1–15 (Mar 2017), http://dx.doi.org/10.1080/01615440.2017.1288598
18. Price, J., Buckles, K., Van Leeuwen, J., Riley, I.: Combining Family
History and Machine Learning to Link Historical Records. Tech. Rep.
w26227, National Bureau of Economic Research, Cambridge, MA (Sep 2019).
https://doi.org/10.3386/w26227, http://www.nber.org/papers/w26227.pdf
19. Rijpma, A., Cilliers, J., Fourie, J.: Record linkage in the Cape of Good Hope Panel.
Historical Methods: A Journal of Quantitative and Interdisciplinary History 0(0),
1–16 (Feb 2019). https://doi.org/10.1080/01615440.2018.1517030, https://doi.
org/10.1080/01615440.2018.1517030
20. Ruggles, S.: Migration, Marriage, and Mortality: Correcting Sources of Bias in
English Family Reconstitutions. Population Studies 46(3), 507–522 (Nov 1992),
https://doi.org/10.1080/0032472031000146486
21. Schraagen, M.P., others: Aspects of record linkage. Ph.D. thesis, Leiden Institute
of Advanced Computer Science (LIACS), Faculty of Science, Leiden University
(2014), https://openaccess.leidenuniv.nl/handle/1887/29716
22. Schulz, K.U., Mihov, S.: Fast string correction with levenshtein automata. Inter-
national Journal on Document Analysis and Recognition 5(1), 67–85 (2002)
23. Vulsma, R.F.: Burgerlijke stand en bevolkingsregister. Centraal Bureau voor Ge-
nealogie, ’s-Gravenhage
24. Wagner, R.A., Fischer, M.J.: The string-to-string correction problem. Journal of
the ACM (JACM) 21(1), 168–173 (1974)
25. Wisselgren, M.J., Edvinsson, S., Berggren, M., Larsson, M.: Testing Methods of
Record Linkage on Swedish Censuses. Historical Methods: A Journal of Quanti-
tative and Interdisciplinary History 47(3), 138–151 (Jul 2014), https://doi.org/
10.1080/01615440.2014.913967