<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Semantically Enriched Historical Data. Drawing on the Example of the Digital Edition of the "Urfehdebucher der Stadt Basel"</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Christopher Pollin</string-name>
          <email>christopher.pollin@uni-graz.at</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Georg Vogeler</string-name>
          <email>georg.vogeler@uni-graz.at</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Graz, Centre for Information Modelling</institution>
          ,
          <country country="AT">Austria</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2017</year>
      </pub-date>
      <fpage>27</fpage>
      <lpage>32</lpage>
      <abstract>
        <p>Historical data is widely recognized as a rather complex type of data that contains records about multi-layered, context-sensitive entities and can often be represented as a graph. This paper describes the digital edition of the "Urfehdebucher der Stadt Basel" as an example of how semantic web technologies can o er comprehensive tools in response to the challenges coming with historical data. It introduces the FEDORA Commons based GAMS-infrastructure, reports the work ow from XML/TEI1 encoded historical documents to semantically enriched data in form of XML/RDF data, and describes the speci c data model for the resource. Finally, the paper discusses how the data can be used beyond a standard web interface with reading and search functionalities, for analysis with network visualisation functionalities.</p>
      </abstract>
      <kwd-group>
        <kwd>GAMS</kwd>
        <kwd>historical data</kwd>
        <kwd>digital edition</kwd>
        <kwd>semantic enrichment</kwd>
        <kwd>Urfehde</kwd>
        <kwd>TEI</kwd>
        <kwd>RDF</kwd>
        <kwd>SKOS</kwd>
        <kwd>data visualisation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>The High Level Expert Group on Scienti c Data formulated their shared
vision for 2030: 'Our vision is a scienti c e-infrastructure that supports seamless
access, use, re-use and trust of data. In a sense, the physical and technical
infrastructure becomes invisible and the data themselves become the infrastructure.'
[Neuroth et al. 2012]
Scienti c data is always related to the context of a scienti c problem. Research
data in the humanities, including historical data, is interlinked to its scienti c
discipline and tends to be complex in a speci c way. [Thaller 1989] points out
particular challenges concerning historical data: Historical terms, for example
'Prussia', can vary in relation to spatial and temporal context. This leads to a
de nition of historical data by [Meron~o-Pen~uela / Hoekstra 2014] as the union
of a static, unique primary source and dynamic secondary sources, where the
latter point at the primary source in di erent time- and context-sensitive ways.
For this reason the authors recommend to describe historical data as a graph
and connect it to linked open data sources using taxonomies or ontologies on the
1 http://www.tei-c.org, 21.7.2017
one hand, and dereferencing services inside a digital archive on the other hand.
Thinking of the mentioned vision and the fact of historical data being
multilayered and context-sensitive, semantic web technologies can o er
comprehensive tools that address these problems. The aim of this paper is it to
outline the process of semantical enrichment of a historical dataset, the
'Urfehdebucher der Stadt Basel ', as the primary source, and their representation
as secondary sources, the 'Urfehdebucher der Stadt Basel { digitale Edition '
[Burghartz / Calvi / Vogeler 2016]. The process of semantically enriching and
formalizing data using semantic web technologies could ful l the vision of data
becoming its own infrastructure.</p>
      <p>The text of the digital edition of the 'Urfehdebucher der Stadt Basel { digitale
Edition' was created in a small scale project by Susanna Burghartz at the
University of Basel in a teaching project together with her students, with particular
contributions by Sonia Calvi and Anna Reiman. The technical realization was
developed by the Centre for Information Modelling at the University of Graz.
The aim of this low-budget and student supported project lies more in an
experimental approach applying semantic web technologies to a historical source.
'Urfehde' can be roughly translated as 'oath of truce'. The purpose of the so
called 'Urfehde' was to settle a dispute between two con ict parties and urge
a sentenced criminal to a unilateral oath not taking revenge for its judge. This
was legal practice in most of central Europe in the late middle ages and the
early modern period, and recorded in the so called 'Urfehdebucher', which have
survived in many archives, as demonstrated in the data of the Index Librorum
Civitatum project.2 The rst Urfehdebuch X of the city of Basel (StadtA Basel
Ratsbucher O10) records 'Urfehde' oaths from 1563 to 1569. This source can be
used as an exemplary dataset as it shows the signi cant structure to be used in
a statistical analysis: in addition, with 625 entries, the dataset is large enough
to contribute to research on the cultural, social, and economic history of early
modern people.</p>
      <p>The task in realising this digital edition was therefore to combine established
and easy to use transcription work ows using XML/TEI annotation with the
conversion to RDF data to prepare a basis for data analysis and data
publication. This calls into question which advantages semantic web technologies can
o er to scholars regarding the retrieval, visualisation and analysis of historical
data in the humanities.
1.1</p>
      <p>Related Work
Maybe the rst digital edition making use of semantic web technologies in a
similar way was the edition of the Henry III ne rolls3 [Ciula et al. 2008]. The
project combined a TEI transcription with a CIDOC-CRM based ontology
expressed in OWL. The RDF data of the project was not made openly accessible.
The Henry III ne rolls project follows the approach of building an extended
2 http://www.stadtbuecher.de/literatur/schlagwort/137667, 18.07.2017.
3 http://www. nerollshenry3.org.uk, 12.7.2017
index for the digital representation of the primary source as [Poupeau 2006]
has described it. A successful example for this approach is 'Sandrart.net '. This
digital scholarly edition encodes data using XML/TEI. The data is made
available Linked Open Data in RDF.4 Similar to this the platform for historical
research SYMOGIH5 is preparing a SPARQL endpoint. Recently the 'Semantic
Blumenbach' project explores new approaches for linking between artefacts and
text [Wettlaufer et al. 2015]. It uses the 'scienti c communication infrastructure'
WissKI 6 to implement semantic web methods for data acquisition, storage and
re-use.</p>
      <p>All these projects follow the extended index approach. This, however, fails short
when it comes to the analysis of abstract concepts apart from the classical index
on named entities like places, persons, and objects. Historical texts like the
Urfehdebucher need additional modelling to become data sets for historical analysis,
in particular regarding the classi cation of criminal o ence, punishment, and
social status of the people involved. Thus the 'The Proceedings of the Old Bailey '
project comes much closer to the Urfehdebucher. The project de nes itself as
a searchable edition of criminal trials held at London's central criminal court.
XML/TEI markup of digitized text o ers the possibility to search and analyse
the source.7 The data set is accessible via an API8, but is not available as RDF
or via a SPARQL endpoint. Therefore there is no project in the same research
area as the 'Urfehdebucher'. Common standards have still to be established and
the challenges of interoperability are not solved yet.
2</p>
    </sec>
    <sec id="sec-2">
      <title>The digital Edition</title>
      <p>The work ow of the Urfehdebucher -project is embedded in the GAMS9, which
is described by [Steiner / Stigler 2017]. GAMS de nes itself as an asset
management system for the humanities and serves the purpose of administration,
publication and long-term preservation of digital resources. It is based on the
open source repository software FEDORA-Commons. Using Cocoon-services and
project speci c content models for varying data streams scholarly data can be
stored and disseminated for public use. The data is represented as readable web
site, as archival data structures in XML, and via various API. GAMS
implements a disseminator for RDF data via a RDF-triplestore. Currently, the open
source software Blazegraph10 is in use, which allows SPARQL-queries and
fulltext search in literals.</p>
      <p>Expert academics transcribed and encoded the source in XML/TEI, structuring
the text, marking up text-speci c phenomena and normalizing places, persons
4 http://ta.sandrart.net/de, 12.07.2017
5 http://symogih.org, 12.7.2017
6 http://www.wiss-ki.eu/, 12.7.2017
7 https://www.oldbaileyonline.org, 12.07.2017
8 https://www.oldbaileyonline.org/static/API.jsp, 12.7.2017
9 gams.uni-graz.at, 12.7.2017
10 https://www.blazegraph.com, 12.07.2017
or concepts. Additionally the TEI attribute ana was used to add speci c
semantics to the applied TEI markup. ana is used because it allows to add global
and multiple interpretation to the TEI markup. This XML/TEI illustrates the
TEI markup.11 The div element with the attribute ana="#uf_Eintrag" de nes
the content of the whole div as an 'Urfehde'-entry representing a single case.
The semantics of the rst entry in the XML/TEI can be summarized as follows:
The hireling 'Heinrich Peter ' from 'Zurich' was judged as an o ender, due to
alcohol abuse on the 'kornmerkt ' (grain market). This statement is encoded in
the XML/TEI using the attribute ana, like ana="#uf_male" for annotating the
gender of a person. The value in the ana attribute is taken from a taxonomy
of categories de ned by the the colleagues in Basel following their
methodological access of the source12. Its hierarchical structure of concepts can easily be
converted into a SKOS-resource. When ingesting the XML/TEI into the GAMS
infrastructure, a project-speci c XSLT-Stylsheet transforms all semantically
enriched data into XML/RDF and writes the triples in the triplestore. The
XML/RDF shows the outcome of the transformation.13 The assertions describe the
aforementioned 'Urfehde'-entry and all its properties linking to other concepts
like uf:PersonOffender, where further properties refer to literals or refer to
concepts normalizing data, like the place 'Zurich'.</p>
      <p>The extracted XML/RDF follows a simple RDFs14 which de nes the entry
(Eintrag) at the core. It represents the case and has properties to identify the
o ence and its classi cation, the persons named in the record and their role, the
type of punishment, and other properties connected directly to the entry and
the legal procedure, e.g. date of the oath (DatumUrfehde), date of the o ence
(DatumTat), the notarial authentication of the entry (NotarialSubscription).
Fulltext (advanced) search functionalities are implemented using SPARQL and
the fulltext capabilities of the blazegraph triple store.15 For this purpose GAMS
o ers a query content model which returns XML data on demand. This can
subsequently be transformed to HTML to o er additional functionalities like
visualisations and data download.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Results and Potentials</title>
      <p>The outcome is a semantically enriched digital edition using RDF data
representation16 for fulltext and advanced search.17 Using the advanced search
functionalities a user is able to employ regular expressions, make temporal constraints
of search results and use the normalization of place names to query the data.
11 gams.uni-graz.at/o:ufbas.1563/TEI SOURCE, 12.09.2017
12 gams.uni-graz.at/o:ufbas.kategorien/TEI SOURCE, 12.07.2017.
13 gams.uni-graz.at/o:ufbas.1563/RDF, 12.09.2017
14 gams.uni-graz.at/o:ufbas.schema, 15.7.2017
15 wiki.blazegraph.com/wiki/index.php/FullTextSearch, 12.7.2017
16 gams.uni-graz.at/o:ufbas.1563/RDF, 13.07.2017.
17 gams.uni-graz.at/query:ufbas.search/get, 13.07.2017.
Regular expressions are particularly useful to so search for orthographic
alternatives, e.g. eefrou?wen which returns words like eefrouwen and eefrowen (for
`spouse'). An example for using normalized data is that a query like Kleinbasel
returns all entries connected to the place named mindren Basell in the text.
A navigation menu leads through the chronologically listed data. The user can
collect entries from search results or while browsing the text into a personal
data basket18, implemented by using the local storage of the browser and simple
JavaScript. Collected entries can be exported as simple CSV to be processed
with a spreadsheet application for further work.</p>
      <p>Because of the fact that the whole data set is de ned as RDF graph and the
data itself has network character, adequate ways of information visualization
are possible. Exemplary scholarly questions regarding the 'Urfehdebucher' could
be if female o enders of a speci c type of crime were treated and punished
differently than male o enders. Visualizing the relations between o ender, places,
time, punishment or crime in the whole data set, or parts of it, could open new
approaches to work with the source, or open possibilities to identify at a glance
which category or question could be interesting. We did some experiments using
d3.js19 library for creating forced graphs, based on the result of the search for a
category.</p>
      <p>This Figure shows a graph of the search by category uf:ThreatOfPunishment.20
The light green node in the center represents this category. Every dark blue node
refers to a case reported in the 'Urfehdebuch', which is connected to the node
with the value uf:male (large blue node). The light blue nodes represent cases
connected to women (uf:female, large yellow node). The other paths from the
case nodes represent dates (light blue), occupations (green), and places of
origin (orange). The gender nodes are obviously the major bridge-nodes, but other
properties to the cases can serve as additional bridge-nodes, e.g. when several
cases contain the same date (1568-06-14 ), same profession (tagloner ) or same
place (Zurich). The forced atlas graph allow a rst instantaneous interpretation:
The degree and centrality of the gender nodes moves the node for female o
enders at the outer part of the graph. A threat of punishment was therefore much
more often applied to male o enders, and alcohol abuse was a problem recorded
mostly for men. Certainly detailed research has to establish the numbers
relative to the number of all cases. The graph visualization can assist retrieval and
discover functionalities in the future.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Conclusion and Further Work</title>
      <p>The example of the Urfehdebucher demonstrates that creating XML/TEI
transcription of a text prepared to be used as semantic web data o ers new
approaches for scholarly edition, ts to the graph-like understanding of historical
18 gams.uni-graz.at/context:ufbas?mode=datenkorb, 13.07.2017.
19 https://d3js.org, 21.07.2017.
20 gams.uni-graz.at/context:ufbas/StrafeStrafandrohung, 12.09.2017
data, and the data becomes more expressive and self-describing. The
transformation of the textual statements in RDF is made with easy annotation and little
programming e ort. The RDF dataset can be used as a fundamental database
technology in the online publication as well as for advanced research questions.
The data created can be queried and visualized in a way that it can be bene cial
for historical research. Finally the publication of this data with semantic web
technologies allows to make the data model, the taxonomy and the data itself
openly available in a standardized way as RDFs, SKOS and generic RDF data.
Aligning the data model and the taxonomy with other resources like the Old
Bailey project is envisioned future improvement and can be the rst step to a
common vocabulary.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Appendix</title>
      <p>{ TEI-Source: gams.uni-graz.at/o:ufbas.1563/TEI SOURCE
{ RDF-Source: gams.uni-graz.at/o:ufbas.1563/RDF
{ Graph of uf:ThreatOfPunishment:</p>
      <p>gams.uni-graz.at/context:ufbas/StrafeStrafandrohung</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [Burghartz / Calvi / Vogeler 2016] Burghartz, Susanna / Calvi, Sonia / Vogeler, Georg: Urfehdebucher der Stadt Basel { digitale Edition,
          <year>Graz 2016</year>
          ,
          <article-title>gams</article-title>
          .unigraz.at/ufbas.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [Ciula et al. 2008] Ciula, Ariana / Spence, Paul / Veira, Jose Miguel:
          <article-title>Expressing complex associations in medieval historical documents. The Henry III Fine Rolls Project</article-title>
          ,
          <source>in: Literary and Linguistic Computing</source>
          <volume>23</volume>
          (
          <year>2008</year>
          ), p.
          <volume>311</volume>
          {
          <issue>325</issue>
          , DOI: 10.1093/llc/fqn018.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [Meron~o-Pen~uela / Hoekstra 2014]
          <article-title>Meron~o-Pen~uela, Albert / Hoekstra, Rinke: What is linked historical data?</article-title>
          ,
          <source>in: International Conference on Knowledge Engineering and Knowledge Management</source>
          . Springer, Cham, p.
          <volume>282</volume>
          {
          <fpage>287</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [Neuroth et al. 2012] Neuroth, Heike / et al.:
          <article-title>Langzeitarchivierung von Forschungsdaten</article-title>
          . Eine Bestandsaufnahme. Hulsbusch,
          <year>2012</year>
          , p.
          <fpage>15</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [Poupeau 2006] Poupeau, Gautier: De l'
          <article-title>index nominum a l'ontologie. Comment mettre en lumiere les reseaux sociaux dans les corpus historiques numeriques?</article-title>
          ,
          <source>in: Digital Humanities</source>
          <year>2006</year>
          . The First ADHO International Conference: Conference Abstracts. Universite Paris-Sorbonne.
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [Steiner / Stigler 2017] Steiner, Elisabeth / Stigler, Johannes : GAMS and
          <string-name>
            <given-names>Cirilo</given-names>
            <surname>Client</surname>
          </string-name>
          .
          <article-title>Policies, documentation and tutorial</article-title>
          .
          <source>Graz</source>
          ,
          <year>2014</year>
          {2017 http://gams.unigraz.at/docs.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [Thaller 1989]
          <article-title>Thaller, Manfred: The Need for a Theory of Historical Computing</article-title>
          , in: Denley, Peter / et al.:
          <article-title>History and Computing II, Manchester</article-title>
          and New York,
          <year>1989</year>
          , p.
          <volume>4</volume>
          {
          <fpage>6</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [Wettlaufer et al. 2015] Wettlaufer, Jorg / et al.:
          <article-title>Semantic Blumenbach. Exploration of Text{Object Relationships with Semantic Web Technology in the History of Science, in: DSH Digital Scholarship in the Humanities 30</article-title>
          ,
          <issue>suppl</issue>
          .
          <fpage>1</fpage>
          .
          <year>2015</year>
          , p.
          <volume>187</volume>
          {
          <fpage>198</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>