<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>An ecosystem for Linked Humanities Data</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Rinke Hoekstra</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Albert Meron~o-Pen~uela</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kathrin Dentler</string-name>
          <email>k.dentlerg@vu.nl</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Auke Rijpma</string-name>
          <email>auke.rijpmag@iisg.nl</email>
          <xref ref-type="aff" rid="aff3">3</xref>
          <xref ref-type="aff" rid="aff6">6</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Richard Zijdeman</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
          <xref ref-type="aff" rid="aff5">5</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ivo Zandhuis</string-name>
          <email>ivo@zandhuis.nl</email>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Data Archiving and Networked Services, KNAW</institution>
          ,
          <addr-line>The Hague, NL</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Computer Science, Vrije Universiteit Amsterdam</institution>
          ,
          <addr-line>NL</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Faculty of Law, University of Amsterdam</institution>
          ,
          <addr-line>NL</addr-line>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>International Institute of Social History</institution>
          ,
          <addr-line>KNAW, Amsterdam, NL</addr-line>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>Ivo Zandhuis Research &amp; Consultancy</institution>
          ,
          <addr-line>Haarlem, NL</addr-line>
        </aff>
        <aff id="aff5">
          <label>5</label>
          <institution>University of Stirling</institution>
          ,
          <addr-line>Stirling</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
        <aff id="aff6">
          <label>6</label>
          <institution>Utrecht University</institution>
          ,
          <addr-line>Utrecht, NL</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2016</year>
      </pub-date>
      <fpage>85</fpage>
      <lpage>96</lpage>
      <abstract>
        <p>The main promise of the digital humanities is the ability to perform scholar studies at a much broader scale, and in a much more reusable fashion. The key enabler for such studies is the availability of su ciently well described data. For the eld of socio-economic history, data usually comes in a tabular form. Existing e orts to curate and publish datasets take a top-down approach and are focused on large collections. This paper presents QBer and the underlying structured data hub, which address the long tail of research data by catering for the needs of individual scholars. QBer allows researchers to publish their (small) datasets, link them to existing vocabularies and other datasets, and thereby contribute to a growing collection of interlinked datasets. We present QBer, and evaluate our rst results by showing how our system facilitates two use cases in socio-economic history.</p>
      </abstract>
      <kwd-group>
        <kwd>Digital Humanities</kwd>
        <kwd>Structured Data</kwd>
        <kwd>Linked Data</kwd>
        <kwd>QBer</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        In a 2014 article in CACM, [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] describes digital humanities as a \movement
and a push to apply the tools and methods of computing to the subject matter
of the humanities." As the fuel of the computational method, the key enabler
for digital humanities research is the availability of data in digital form. At
the inauguration of the Center for Humanities and Technology (CHAT), Jose
van Dijck, the president of the Dutch Royal Academy of Sciences, characterizes
progress in this eld as the growing ability to tremendously increase the scale at
which humanities research takes place, thereby allowing for much broader views
on the subject matter [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ]. Tackling this important challenge for the digital
humanities requires straightforward transposition of research queries from one
humanities dataset to another, or even allow for direct cross-dataset querying.
It is widely recognized that Linked Data technology is the most likely candidate
to ll this gap. We argue that current e orts to increase the availability and
accessibility of this data do not su ce. They do not cater for the \long tail of
research data" [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], the large volumes of small datasets produced by individual
researchers; and existing Linked Data tooling is too technology-oriented to be
suitable for humanities researchers at large.
      </p>
      <p>This paper presents QBer and the underlying CLARIAH Structured Data
Hub (CSDH),8 whose aim is to address the limitations of current data-publishing
practice in the digital humanities, and socio-economic history in particular. The
CSDH integrates a selection of large datasets from this domain, while QBer
is a user-facing web application that allows individual researchers to upload,
convert and link `clean' data to existing datasets and vocabularies in the hub
without compromising the detail and heterogeneity of the original data (see
Section 2). Under the hood, we convert all data to RDF, but QBer does not
bother scholars with technical aspects. An inspector-view displays the result of
the mappings { a growing network of interconnected datasets { in a visually
appealing manner (See Figure 1). The visualization is just one of the incentives
we are developing. The most important incentive will be the ability to allow for
transposing research queries across datasets, and the ability to perform
crossdataset querying. Section 4 describes two use-cases that evaluate the ability of
QBer and the CSDH to ful ll that promise. We rst discuss related work in
Section 2 and describe the QBer and CSDH systems in Section 3.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        Historical data comprises text, audiovisual content or { in our case { data in
the more traditional sense: structured data in tabular form. Preparing
historical data for computational analysis takes considerable expertise and e ort. As
a result, digital data curation e orts are organized (and funded) in a top-down
fashion, and focus on the enrichment of individual datasets and collections of
sufcient importance and size. Examples are the North Atlantic Population Project
(NAPP) [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ], the Clio-Infra repository [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], and the Mosaic project.9 Such projects
face three important issues. First, they often culminate in a website where
subsets of the data can be downloaded, but cannot be programmatically accessed,
isolating the data from e orts to cross-query over multiple datasets. Second,
these projects enforce commitment to a shared standard: standardization leads
to loss of detail, and thus information. The bigger a project is, the higher the
cost of reconciling heterogeneity { in time, region, coding etc. { between the large
number of sources involved. Finally, the scale of these projects is unsuited for
the large volumes of important { but sometimes idiosyncratic { smaller datasets
created by individual researchers: the long tail of research data [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>
        For this last reason, it is di cult for individual researchers to make their
data available in a sustainable way [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ]. Despite evidence that sharing research
data results in higher citation rates [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ], researchers perceive little incentive to
8A screencast of the system is available at https://vimeo.com/158153564.
9See https://www.clio-infra.eu and http://www.censusmosaic.org/
publish their data with su ciently rich, machine interpretable metadata. Data
publishing and archiving platforms such as EASY (in the Netherlands),10
Dataverse11 or commercial platforms such as Figshare12 and Dryad13 aim to lower
the threshold for data publishing and cater for increasing institutional pressure
to archive research data. However, as argued in [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], the functionality of these
platforms { data upload, data landing page, citable references, default licensing,
long term preservation { is limited with respect to the types of provenance and
content metadata that can be associated with publications, and they do not
o er the exibility of the Linked Data paradigm. This has a detrimental e ect
on both ndability and reusability of research data.
      </p>
      <p>
        In socio-economic history, a central challenge is to query data combined from
multiple tabular sources: spreadsheets, databases and CSV les. The multiple
bene ts of Linked Data as a data integration method [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] encourage the
representation of tabular sources as Linked Data.14 CSV and HTML tables can
be represented in RDF using CSV2RDF and DRETa [
        <xref ref-type="bibr" rid="ref16 ref22">16, 22</xref>
        ]. For other tabular
formats, like Microsoft Excel, Google Sheets, and tables encoded in JSON or
XML, larger frameworks are needed, like Opencube [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], Grafter [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ], and the
combination of OpenRe ne and DERI's RDF plugin [
        <xref ref-type="bibr" rid="ref21 ref7">7, 21</xref>
        ]. TabLinker [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] uses
a semi-automatic approach to represent multidimensional tables with missing
observation/record data [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] as RDF Data Cube [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. As in TopBraid Composer,15
TabLinker can use external mapping les instead of an interactive interface.
These tools are targeted to relatively tech-savvy users for whom the
conversion to RDF is a goal in itself. In our case, prospective users will bene t from
interlinked data, but have no interest in the underlying technology.
      </p>
      <p>
        An important question then is: how are these mapping les created? Work in
ontology and vocabulary alignment, as in the OAEI,16 or identity reconciliation,
aim to perform automatic alignments. Given the often very speci c (historic)
meaning of terms in our datasets, these techniques are likely to be error-prone,
hard to optimize (given the heterogeneity of our data) and unacceptable to
scholars. Interactive alignment tools, such as Amalgame [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ], developed for the
cultural heritage and digital humanities domains, are more promising, but treat
the alignment task in isolation rather than as part of the data publishing process.
Anzo for Excel17 is an extension for Microsoft Excel for mapping spreadsheet
data to ontologies. Similarly, [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and RightField18 allow for selecting terms from
an ontology from within Excel spreadsheets, but these require the data to
conform to a pre-de ned template.
      </p>
      <p>10See http://easy.dans.knaw.nl
11See http://dataverse.harvard.edu and http://dataverse.nl
12See http:// gshare.com
13See http://datadryad.org
14For a comprehensive list, see e.g.
https://github.com/timrdf/csv2rdf4lodautomation/wiki and http://www.w3.org/wiki/ConverterToRdf
15https://www.w3.org/2001/sw/wiki/TopBraid
16The Ontology Alignment Evaluation Initiative, see oaei.ontologymatching.org/
17https://www.w3.org/2001/sw/wiki/Anzo
18https://www.sysmo-db.org/right eld</p>
    </sec>
    <sec id="sec-3">
      <title>QBer and the Structured Data Hub</title>
      <p>To create a viable research ecosystem for Linked Humanites Data of all sizes,
we need to combine expert knowledge with automated Linked Data generation.
It should be easy and pro table for individual researchers to enrich and publish
their data via our platform. To achieve the rst goal, we developed QBer;19 an
interactive tool that allows non-technical scholars to convert their data to RDF,
to map the `variables' (column names) and values in tabular les to Linked Data
concept schemes, and to publish their data on the structured data hub. What
sets QBer apart is that all Linked Data remains under the hood. To achieve the
second goal, we build in direct feedback (reuse of existing content, visualizations,
etc.) on top of the CSDH and demonstrate the research bene ts of contributing
data to it (see Section 4). We illustrate QBer by means of a walkthrough of the
typical usage of the tool, and then summarize its connection with the CSDH.</p>
      <p>Using QBer consists of interacting with three main views: the welcome screen,
the mapping screen, and the inspector. In the welcome screen, users rst
authenticate with OAuth compatible services (e.g. Google accounts), and then select a
raw dataset to work with. Datasets can be selected directly from the CSDH, or
imported from a Dataverse collection by providing a DOI.</p>
      <p>Once a dataset is loaded, QBer displays the mapping screen (Figure 1). This
screen is divided into the variables sidebar (left) and the variable panel (right).
The sidebar allows the user to search and select a variable (i.e. column) from the
dataset. Once the user clicks on one variable, the variable panel will show that
19See https://github.com/CLARIAH/qber
variable's details: the variable category, the variable metadata, and the value
frequency table.</p>
      <p>
        We distinguish between three variable categories : coded, identi er and other.
Values for coded variables are mapped to corresponding concepts (skos:Concept)
within a skos:ConceptScheme, which establishes all possible values the variable
can take. If the variable is of type identi er, its values are mapped to dataset
speci c minted URIs. Finally, the values of variables of type other are mapped
to literals instead of URIs. The `Community' button gives access to all known
prede ned datacube dimensions. These come from LSD Dimensions, an index
of dimensions used in Data Structure De nitions of RDF Data Cubes on the
Web [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] and from datasets previously processed by QBer that now reside on
the CSDH.
      </p>
      <p>The variable metadata panel can be used to change the label and the
description of the variable. If the variable has been speci ed to be \coded" in the
previous pane, it can be linked to existing code lists curated by the Linked Open
Data community. QBer provides access to all concept schemes from the Linked
Data cache,20 and the CSDH itself. If the variable is of type \other", this panel
lets users de ne their own transformation function for literals.</p>
      <p>The frequency table panel has three purposes. First, it allows for quick
inspection of the distribution of all values of the selected variable, by displaying
their frequency. Second, if the variable type is \coded", it lets the user map the
default minted URI for the chosen value to any skos:Concept within the selected
skos:ConceptScheme in the variable metadata panel. QBer also has a batch
mapping mode that prompts the user to map all values of the variable interactively.
Third, the panel shows the current mappings for values of the selected variable.</p>
      <p>
        Mappings can be materialized in two ways. Users can click on Save in the
navigation bar, which stores the current mapping status of all variables in their
local cache. Clicking on Submit sends the mappings to the CSDH API, which
integrates them with other datasets in the hub. Under the hood, the data is
converted to a Nanopublication[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] with provenance metadata in PROV, where
the assertion-graph is an RDF Data Cube representation of the data [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. The
RDF representation is a verbatim conversion of the data; mappings between
the original values and pre-existing vocabularies are explicitly represented
using SKOS mapping relations. This scheme allows for the co-existence of
alternative interpretations (by di erent scholars) of the data, thus overcoming the
standardization-limitation alluded to in Section 2.
      </p>
      <p>The inspector, shown at the bottom right of Figure 1, allows users to
explore the contents of the CSDH. The visualization shows a graph of nodes and
edges, with di erent icons representing di erent node types. User nodes
represent users that have submitted data to the hub, according to their provided
OAuth identities in the welcome screen. Dataset nodes represent Data
Struc20See http://lod.openlinksw.com we aim to extend this with schemes from the
LODLaundromat, http://www.lodlaundromat.org.
ture De nitions21 (DSD) submitted by these users. Dimension nodes represent
dimensions (i.e. variables, columns in the raw data) within those DSD.
Dimensions that are externally de ned (e.g. by SDMX or some other external party)
and are thus not directly used in datasets, are represented as cloud-icons. Users
can interact with the inspector in two ways: hovering on nodes displays their
properties; and dragging them moves the graph elements for better layout.</p>
      <p>QBer and the Inspector work on top of the CSDH API,22 which carries
out the backend functionality. This includes converting, storing, and managing
POST and GET requests over datasets. The CSDH API functionality will be
extended to cover standard data search, browsing and querying tasks. All data
is currently stored in an OpenLink Virtuoso CE triplestore,23 but since CSDH
communicates through the standard SPARQL protocols it is not tied to one
triple store vendor.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Evaluation</title>
      <p>In this section, we evaluate the results of our approach by means of two use cases
in socio-economic history research. The rst use case investigates the question
as to whether the CSDH indeed allows research to be carried out on a broader
scale. In this case, we transpose a query that was built to answer a research
question aimed at a dataset of one country to a dataset that describes another
country. The second use case investigates the question as to whether our system
facilitates the work ow of a typical individual researcher.
4.1</p>
      <p>
        Use Case 1: Born Under a Bad Sign
Economic and social history takes questions and methods from the social
sciences to the historical record. An important line of research focuses on the
determinants of historical inequality. One hypothesis here is that prenatal [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and
early-life conditions [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] have a strong impact on socioeconomic and health
outcomes later in life. For example, a recent study on the United States found that
people born in the years and states hit hardest during the Great Depression of
the 1930s had lower incomes and higher work disability rates in 1970 and 1980
[
        <xref ref-type="bibr" rid="ref29">29</xref>
        ]. This study inspired this use case.
      </p>
      <p>
        Most studies on the impact of early life conditions are case studies of single
countries. Therefore, the extent to which results can be generalized { their
external validity { is di cult to establish (e.g., di ering impact of early life conditions
in rich and poor countries). Moreover, historical data is often idiosyncratic. This
21According to [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], a Data Structure De nition \de nes the structure of one or more
datasets. In particular, it de nes the dimensions, attributes and measures used in the
dataset along with qualifying information such as ordering of dimensions and whether
attributes are required or optiona".
      </p>
      <p>22https://github.com/CLARIAH/wp4-csdh-api
23See http://www.openlinksw.com
means that dataset-speci c characteristics such as sampling and variable coding
schemes might in uence the results (see Section 2).</p>
      <p>
        In this use case, we explore the relation between economic conditions in
individuals' birth year and occupational status in the historical census records of
Canada and Sweden in 1891. In many cases it would be necessary to link the
two census datasets so that they can be queried in the same way. Here, however,
we use two harmonized datasets from the North Atlantic Population Project
(NAPP) [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ]. Economic conditions are measured using historical GDP per capita
gures from the Clio-Infra repository [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Because our outcome is occupational
status, we have to enrich the occupations in the census with occupational codes
and a status scheme. Because the NAPP-project uses an occupational classi
cation that cannot provide internationally comparable occupational status scores,
we have to map their occupational codes to the HISCO24 system, so that we can
use the HISCAM cross-nationally comparable occupational status scheme [
        <xref ref-type="bibr" rid="ref15 ref17">17,
15</xref>
        ].25
      </p>
      <p>In general terms, the data requirements are typical of recent trends in large
database usage in economic and social history: 1) the primary unit of analysis
is the individual (microdata); 2) a large number of observations is analyzed; 3)
multiple micro-datasets are analyzed; 4) microlevel observations are linked to
macro-level data through the dimensions time and geographical area; 5)
qualitative data is encoded to extract more information from it.</p>
      <p>Current Work ow. The traditional work ow to do this could include the
following steps. First, the researcher has to nd and download the datasets from
multiple repositories. The datasets, which come in various formats, then have
to be opened, and, if necessary, the variables have to be renamed, cleaned, and
re-encoded to be able to join them with other datasets. We can rely on previous
cleaning and harmonization e orts of the NAPP project, but in many other
situations the researcher would have to do this manually. Finally, the joined
data has to be saved in a format that can be used by a statistical program.</p>
      <p>New Work ow. Using QBer and the CSDH, the work ow is as follows.
Linkeddata tools are used to discover data on the hub. In our case, a linked data
browser26 and exploratory SPARQL queries were used. Note that to discover
datasets and especially linked datasets on the CSDH, it is necessary that someone
uploaded the datasets and created the links in the rst place, for example by
linking datasets to a common vocabulary. While it is unavoidable that someone
has to do this at some point, the idea behind the hub is that if it is done once,
the results can be re-used by other researchers.</p>
      <p>Next, queries were built and stored on GitHub. The resultsets that these
queries produce against the data hub are used to create the dataset that is
to be analyzed. grlc, a tool we developed for creating Linked Data APIs using
SPARQL queries in GitHub repositories, was helpful in exploring the data on
24HISCO: Historical International Standard Classi cation of Occupations.
25https://github.com/rlzijdeman/o-clack and http://www.camsis.stir.ac.uk/
hiscam/</p>
      <p>26https://github.com/Data2Semantics/brwsr
.398 ● ●
● ● ●</p>
      <p>Canada</p>
      <p>●
●
●
●</p>
      <p>
        ●
●● ● ●● il()scghoam .......934369893239390400402 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● il()csahogm .......943363994320989300240 ● ●●●●● ● ●● ●●●● ●●● ●●● ● ●● ● ●●●● ● ●● ● ●● ● ● ● ●●● ●● ● ● ● ● ● ●
the hub and executing the eventual query [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ].27 This tool can also be used
to download the data directly into a statistical environment like R via HTTP
requests, for example using curl. Alternatively, the CSDH can be queried directly
from a statistical environment using SPARQL libraries.
      </p>
      <p>
        Observations. While more sophisticated models are required to disentangle
cohort, period and age e ect [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], the results suggest that in Canada in 1891 the
expected e ects of early life-conditions are found: higher GDP per capita in a
person's birth year was associated with higher occupational status at the time of
the census. However, in Sweden, the opposite was the case ( gure 2). This shows
the relative ease at which the CSDH facilitates reusable research questions by
means of query transposition.
The second use case takes the form of a user study. It is about the
\Dwarsliggers"28 dataset by Ivo Zandhuis that collects data pertaining to a solidarity
strike at the maintenance workshop of the Holland Railway Company
(Hollandsche IJzeren Spoorweg-Maatschappij ), in the Dutch city of Haarlem in 1903.
From a sociological perspective, strikes are of interest for research on social
cohesion as it deals both with the question of when and why people live peaceful
together (even when in disagreement) and the question of how collective action
is successfully organized, a prerequisite for a successful strike. The
Dwarsliggers dataset is one of the few historical cases where data on strike behavior is
available at the individual level.
      </p>
      <p>The creation and use of this dataset is exemplary of the work ow of small
to medium quantitative historical research projects in the sense that it relies on
multiple data sources that need to be connected in order to answer the research
questions. We brie y discuss this work ow, and then show the impact that QBer
and the CSDH have.</p>
      <p>27https://github.com/CLARIAH/grlc
28In Dutch, a \dwarsligger" can mean both a railroad tie, and an obstructive person.</p>
      <p>Current Work ow. Zandhuis' current work ow is very similar to the one
reported in the rst use case. He rst digitized the main dataset on the strike
behaviour of employees at the maintenance workshop of the railway company
(N = 1163). Next, he gathered data from multiple sources in which these
employees also appear, adding individual characteristics that explain strike behaviour.
For example, he derived family situations from the Dutch civil registers, and the
economic position from tax registers, resulting in a separate dataset per source.
Next, he inserted these datasets into a SQL database. In order to derive a
concise subset to analyze his research questions, using e.g. QGIS, Gephi or R, he
wrote SQL queries to extract the relevant information. These queries are usually
added as an appendix to his research papers.</p>
      <p>New Work ow. In collaboration with Zandhuis, we revisited this work ow
using QBer. Zandhuis, as most historians, uses spreadsheets to enter data, and
uses a speci c layout to enhance the speed and quality of data entry. The rst
step was to convert the data to a collection of .csv les. This is just a temporary
limitation, as the CSDH is not necessarily restricted to CSV les. It uses the
Python Pandas library29 for loading tabular les into a data frame.</p>
      <p>
        The second step involves visiting each data le in turn, and linking the data to
vocabularies and through them to other datasources. Data about the past often
comes with a wide variety of potential values for a single variable. Religion, for
example, can have dozens of di erent labels as new religions came about and
old religions disappeared. As described in Section 3, QBer provides access to
a large range of such classi cations, basically all those available in the Linked
Data cloud and the CSDH. For example, QBer provides all occupation concepts
from the HISCO classi cation used in the rst use case [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. Researchers can
use occupational labels to get the correct codes from the latest version of this
classi cation and, eventually, concepts linked to it. QBer however also shows the
results of earlier coding e orts, so that historians can bene t from these (e.g.
another dataset may have the same literal value already mapped to as HISCO
code). This step is new compared to Zandhuis' original work ow. The linking
of occupational labels now enables him to combine an employee with his social
status (HISCAM). This allows him to directly include a new, relevant, aspect
in his study. Moreover, since QBer makes coding decisions explicit, they can be
made subject to the same peer review procedure used to assess the quality of
a research paper. In the CSDH, original values of the dataset and the mapped
codings (potentially by di erent researchers) live side-by-side. Thus QBer adds
to the ease of use in coding variables, increases exibility by allowing for multiple
interpretations, and allows for more rigorous evaluation of coding e orts. The
inspector graph of Figure 1 depicts the result of the new work ow.
      </p>
      <p>
        The third step was then to query the datasets in order to retrieve the subset
of data needed for analysis. As in the rst use case, we design SPARQL queries
that, when stored on GitHub, can be directly executed through the grlc API.
This makes replication of research much easier: rather than including the query
as an appendix of a research paper, the query is now a rst order citizen and can
29See http://pandas.pydata.org
even be applied to other datasets that use the same mappings. Again, through
the API, these queries can easily be accessed from within R, in order to perform
statistical analysis. Indeed, the grlc API is convenient, but it is a lot to ask
non-computer science researchers to design SPARQL queries. However, as we
progress, we expect to be able to identify a collection of standard SPARQL
query templates that we can expose in this manner (see also [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]).
      </p>
      <p>To illustrate this, consider that since the Dwarsliggers collection contains
multiple datasets on the same individuals at the same point in time, there are
multiple observations of the same characteristics (e.g. age, gender, occupation,
religion). However, the sources di er in accuracy. For example, measuring marital
status is one of the key aims of the civil registry, while personnel les may contain
information on marital status, but it is not of a key concern for a company
to get this measurement right. By having all datasets mapped to vocabularies
through QBer and having the queries stored in GitHub and executed by grlc,
each query can readily be repeated using di erent sources on the same variables.
This is useful as a robustness check of the analysis or even be used in what
historians refer to as a 'source criticism' (a re ection of the quality and usefulness
of a source). This, again, is similar to the rst use case, but it emphasizes an
additional role for the queries as so-called 'edit rules'.</p>
      <p>Observations. To conclude, this use case shows that the QBer tool and
related infrastructure provides detailed insight in how the data is organized, linked
and analyzed. Furthermore, the data can be queried live. This ensures reusable
research activities ; not just reusable data.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>The preceding sections presented QBer and the CSDH to address the limitations
of existing digital humanities data curation projects in facilitating 1) the long
tail of research data and 2) research at a broader scale, enabling cross-dataset
querying and reuse of queries. We argued that existing Linked Data publishing
and mapping tools do not meet the needs of scholars that are not technologically
versed (or interested).</p>
      <p>QBer and the CSDH enable individual scholars to publish and use their data
in a exible manner. QBer allows researchers to publish their (small) datasets,
link them to existing vocabularies and other datasets, and thereby contribute to a
growing collection of interlinked datasets hosted by the CSDH. The CSDH o ers
services for inspecting data, and (in combination with grlc) reusable querying
across multiple datasets. We illustrated these features by means of two use cases.
The rst shows the ability of the Linked Data paradigm used in the CSDH to
signi cantly lower the e ort needed to do comparative research (even when the
data was published as part of the same larger standardization e ort). The second
use case shows how publishing data through QBer allows individual researchers
to have more grip on their data, to be more explicit regarding data interpretation
(coding) and, via the CSDH, to be able to answer more questions for free (e.g.
the mapping through HISCO to HISCAM).</p>
      <p>Of course, there still is room for expansion. To ensure uniqueness of
identiers, historical 'codes' need to be mapped to URIs. This is technically trivial, but
historians are not used to these lengthy identi ers in their statistical analyses.
Secondly, formulating research questions as queries requires an understanding of
the structure of the data. Given the large numbers of triples involved, this can
be di cult. As said above, standard APIs based on SPARQL query templates
should solve some of this problem, but o ering a user-friendly data inspection
tool is high on our list. SPARQL templates allow us to solve another issue:
allowing for free-form querying can have a detrimental e ect on the performance of
the CSDH. The use of templates enables more e cient use of caching strategies.</p>
      <p>But even without such improvements, we believe that the use cases show that
QBer and the CSDH already broaden the scope of supported work ows and data
in our ecosystem, and bring the bene ts of Linked Data and the Semantic Web
at the ngertips of humanities scholars.</p>
      <p>Acknowledgements This work was funded by the CLARIAH project of the Dutch
Science Foundation (NWO) and the Dutch national programme COMMIT.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Ashkpour</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <article-title>Meron~o-Pen~uela,</article-title>
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Mandemakers</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.</surname>
          </string-name>
          :
          <article-title>The Dutch historical censuses: Harmonization and RDF</article-title>
          .
          <source>Historical Methods: A Journal of Quantitative and Interdisciplinary History</source>
          <volume>48</volume>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2. van Assem,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Rijgersberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Wigham</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Top</surname>
          </string-name>
          , J.:
          <article-title>Converting and annotating quantitative data tables</article-title>
          .
          <source>In: Proceedings of the International Semantic Web Conference (ISWC</source>
          <year>2010</year>
          ). LNCS, vol.
          <volume>6496</volume>
          , pp.
          <volume>16</volume>
          {
          <fpage>31</fpage>
          . Springer (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Barker</surname>
            ,
            <given-names>D.J.:</given-names>
          </string-name>
          <article-title>The fetal and infant origins of adult disease</article-title>
          .
          <source>BMJ: British Medical Journal</source>
          <volume>301</volume>
          (
          <issue>6761</issue>
          ),
          <volume>1111</volume>
          (
          <year>1990</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Bartels</surname>
            ,
            <given-names>L.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jackman</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>A generational model of political learning</article-title>
          .
          <source>Electoral Studies</source>
          <volume>33</volume>
          ,
          <issue>7</issue>
          {
          <fpage>18</fpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Bolt</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Timmer</surname>
          </string-name>
          , M.,
          <string-name>
            <surname>van Zanden</surname>
            ,
            <given-names>J.L.</given-names>
          </string-name>
          :
          <article-title>GDP per capita since 1820</article-title>
          . In: How Was Life? Global well-being since
          <year>1820</year>
          , pp.
          <volume>57</volume>
          {
          <fpage>72</fpage>
          .
          <article-title>Organisation for Economic Co-operation and</article-title>
          <string-name>
            <surname>Development</surname>
          </string-name>
          (Oct
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Cyganiak</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reynolds</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tennison</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <source>The RDF Data Cube Vocabulary. Tech. rep., W3C</source>
          (
          <year>2013</year>
          ), http://www.w3.org/TR/vocab
          <article-title>-data-cube/</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7. DERI: RDF Re ne
          <article-title>- a Google Re ne extension for exporting RDF</article-title>
          .
          <source>Tech. rep., Digital Enterprise Research Institute</source>
          (
          <year>2015</year>
          ), http://re ne.deri.ie/
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Ferguson</surname>
            ,
            <given-names>A.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nielson</surname>
            ,
            <given-names>J.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cragin</surname>
            ,
            <given-names>M.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bandrowski</surname>
            ,
            <given-names>A.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Martone</surname>
            ,
            <given-names>M.E.</given-names>
          </string-name>
          :
          <article-title>Big data from small data: data-sharing in the `long tail' of neuroscience</article-title>
          .
          <source>Nature</source>
          neuroscience
          <volume>17</volume>
          (
          <issue>11</issue>
          ),
          <volume>1442</volume>
          {
          <fpage>1447</fpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Groth</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gibson</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Velterop</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>The anatomy of a nanopublication</article-title>
          .
          <source>Information Services and Use</source>
          <volume>30</volume>
          (
          <issue>1-2</issue>
          ),
          <volume>51</volume>
          {
          <fpage>56</fpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Haigh</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>We have never been digital</article-title>
          .
          <source>Commun. ACM</source>
          <volume>57</volume>
          (
          <issue>9</issue>
          ),
          <volume>24</volume>
          {28 (Sep
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Heath</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bizer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Linked Data: Evolving the Web into a Global Data Space</article-title>
          . Morgan and Claypool, 1st edn. (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Heckman</surname>
            ,
            <given-names>J.J.:</given-names>
          </string-name>
          <article-title>Skill formation and the economics of investing in disadvantaged children</article-title>
          .
          <source>Science</source>
          <volume>312</volume>
          (
          <issue>5782</issue>
          ),
          <year>1900</year>
          {1902 (Jun
          <year>2006</year>
          ), http://www.sciencemag.org/ content/312/5782/1900
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Hoekstra</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Groth</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          : Linkitup:
          <article-title>Link discovery for research data</article-title>
          .
          <source>AAAI Fall Symposium Series Technical Reports (FS-13-01)</source>
          ,
          <volume>28</volume>
          {
          <fpage>35</fpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Kalampokis</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nikolov</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , et al.:
          <article-title>Exploiting Linked Data Cubes with OpenCube Toolkit</article-title>
          . In: Posters and
          <string-name>
            <given-names>Demos</given-names>
            <surname>Track</surname>
          </string-name>
          , 13th
          <source>International Semantic Web Conference (ISWC2014)</source>
          . vol.
          <volume>1272</volume>
          .
          <string-name>
            <surname>CEUR-WS</surname>
          </string-name>
          ,
          <source>Riva del Garda</source>
          ,
          <source>Italy</source>
          (
          <year>2014</year>
          ), http://ceurws.org/Vol-
          <volume>1272</volume>
          /paper 109.pdf
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Lambert</surname>
            ,
            <given-names>P.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zijdeman</surname>
            ,
            <given-names>R.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Van Leeuwen</surname>
            ,
            <given-names>M.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maas</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Prandy</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>The construction of HISCAM: A strati cation scale based on social interactions for historical comparative research</article-title>
          .
          <source>Historical Methods: A Journal of Quantitative and Interdisciplinary History</source>
          <volume>46</volume>
          (
          <issue>2</issue>
          ),
          <volume>77</volume>
          {
          <fpage>89</fpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Lebo</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McCusker</surname>
            ,
            <given-names>J.: csv2rdf4lod. Tech. rep.</given-names>
          </string-name>
          , Tetherless World, RPI (
          <year>2012</year>
          ), https://github.com/timrdf/csv2rdf4lod-automation/wiki
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17. van Leeuwen,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Maas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            ,
            <surname>Miles</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          :
          <article-title>HISCO: Historical International Standard Classi cation of Occupations</article-title>
          . Leuven University Press (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Meron</surname>
          </string-name>
          <article-title>~o-Pen~uela, A.: LSD dimensions: Use and reuse of linked statistical data. In: Knowledge Engineering and Knowledge Management (EKAW</article-title>
          <year>2014</year>
          ). LNCS, vol.
          <volume>8982</volume>
          , pp.
          <volume>159</volume>
          {
          <issue>163</issue>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Meron</surname>
          </string-name>
          <article-title>~o-Pen~uela,</article-title>
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Ashkpour</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Rietveld</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Hoekstra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Schlobach</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.</surname>
          </string-name>
          :
          <article-title>Linked Humanities Data: The next frontier?</article-title>
          <source>In: 2nd International Workshop on Linked Science (LISC2012)</source>
          ,
          <source>ISWC</source>
          . vol.
          <volume>951</volume>
          .
          <string-name>
            <surname>CEUR-WS</surname>
          </string-name>
          (
          <year>2012</year>
          ), http://ceurws.org/Vol-
          <volume>951</volume>
          /
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Meron</surname>
          </string-name>
          <article-title>~o-Pen~uela,</article-title>
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Hoekstra</surname>
          </string-name>
          ,
          <string-name>
            <surname>R.</surname>
          </string-name>
          :
          <article-title>Using grlc to spice up GitHub repostories as Linked Data APIs</article-title>
          .
          <source>In: Proceedings of the Services and Applications over Linked APIs and Data workshop, ESWC</source>
          <year>2016</year>
          (
          <year>2016</year>
          ), to appear
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Morris</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guidry</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Magdinie</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>OpenRe ne: A free, open source, powerful tool for working with messy data</article-title>
          .
          <source>Tech. rep., The OpenRe ne Development Team</source>
          (
          <year>2015</year>
          ), http://openre ne.org/
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22. Mun~oz, E.,
          <string-name>
            <surname>Hogan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mileo</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>DRETa: Extracting RDF from Wikitables</article-title>
          .
          <source>In: Int. Semantic Web Conference, posters and demos</source>
          . pp.
          <volume>98</volume>
          {
          <fpage>92</fpage>
          .
          <string-name>
            <surname>CEUR-WS</surname>
          </string-name>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23. van Ossenbruggen,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Hildebrand</surname>
          </string-name>
          , M.,
          <string-name>
            <surname>de Boer</surname>
          </string-name>
          , V.:
          <article-title>Interactive vocabulary alignment</article-title>
          .
          <source>In: Research and Advanced Technology for Digital Libraries (TPDL</source>
          <year>2011</year>
          ). LNCS, vol.
          <volume>6966</volume>
          , pp.
          <volume>296</volume>
          {
          <fpage>307</fpage>
          . Springer-Verlag, Berlin, Heidelberg (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Piwowar</surname>
            ,
            <given-names>H.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Day</surname>
            ,
            <given-names>R.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fridsma</surname>
            ,
            <given-names>D.B.</given-names>
          </string-name>
          :
          <article-title>Sharing detailed research data is associated with increased citation rate</article-title>
          .
          <source>PloS one 2</source>
          (
          <issue>3</issue>
          ),
          <source>e308 (Jan</source>
          <year>2007</year>
          ), http: //dx.plos.
          <source>org/10</source>
          .1371/journal.pone.0000308
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Renckens</surname>
          </string-name>
          , E.:
          <article-title>Digital humanities verfrissen onze blik op bestaande data</article-title>
          .
          <source>E-Data &amp; Research</source>
          <volume>10</volume>
          (
          <year>February 2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>Roman</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nikolov</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          , et al.:
          <article-title>DataGraft: One-stop-shop for Open Data management</article-title>
          .
          <source>Semantic Web { Interoperability</source>
          , Usability, Applicability (
          <year>2016</year>
          ), under review, http://www.semantic
          <article-title>-web-journal.net/content/datagraft-one-stopshop-open-data-management</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <surname>Ruggles</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Roberts</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sarkar</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sobek</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>The North Atlantic Population project: Progress and prospects</article-title>
          .
          <source>Historical Methods: A Journal of Quantitative and Interdisciplinary History</source>
          <volume>44</volume>
          (
          <issue>1</issue>
          ), 1{6 (Jan
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          28.
          <string-name>
            <surname>Tenopir</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Allard</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Douglass</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aydinoglu</surname>
            ,
            <given-names>A.U.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Read</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mano</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Frame</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Data sharing by scientists: Practices and perceptions</article-title>
          .
          <source>PLoS ONE</source>
          <volume>6</volume>
          (
          <issue>6</issue>
          ),
          <source>e21101 (06</source>
          <year>2011</year>
          ), http://dx.doi.org/10.1371%2Fjournal.
          <source>pone.0021101</source>
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          29.
          <string-name>
            <surname>Thomasson</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fishback</surname>
            ,
            <given-names>P.V.</given-names>
          </string-name>
          :
          <article-title>Hard times in the land of plenty: The e ect on income and disability later in life for people born during the great depression</article-title>
          .
          <source>Explorations in Economic History</source>
          <volume>54</volume>
          ,
          <issue>64</issue>
          {78 (Oct
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>