<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Tabea Tietz</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jorg Waitelonis</string-name>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kanran Zhou</string-name>
          <email>akanran.zhou@student.kit.edu</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Paul Felgentre</string-name>
          <email>paul.felgentreff@gmail.com</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nils Meyer</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andreas Weber</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Harald Sack</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Baden-Wurttemberg State Archives</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>FIZ Karlsruhe</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Karlsruhe Institute of Technology, Institute AIFB</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Leibniz Institute for Information Infrastructure</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>yovisto GmbH</institution>
          ,
          <addr-line>Potsdam</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Archives today are publishing their cultural heritage data on the Web for exploration. However, for archive novices the traditional archival structures often are not intuitive and di cult to understand, and thus challenges data access and consumption. To tackle this problem, Linked Stage Graph was developed, a knowledge graph (KG) on the foundation of historical data about the Stuttgart State Theater. The data was made available by the Baden-Wurttemberg State Archives for the Coding da Vinci hackathon. This demo paper contributes the KG, a SPARQL endpoint, named entity extraction and linking to existing authoritative KGs as well as a dedicated user interface for exploration.</p>
      </abstract>
      <kwd-group>
        <kwd>cultural heritage</kwd>
        <kwd>linked data</kwd>
        <kwd>knowledge graph</kwd>
        <kwd>UI</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Digitizing cultural heritage has been a major task for galleries, libraries, archives
and museums (GLAM) for many years now. As a result, a number of Web based
platforms have been developed with the goal to enable researchers to access and
analyze the data scienti cally as well as to allow the general public to explore the
data. However, many archival web platforms present their content organized in
a way familiar only to archive experts but users who are unfamiliar with archival
practise to structure information often nd it challenging to access and explore
the provided content [
        <xref ref-type="bibr" rid="ref1 ref2">1,2</xref>
        ].
      </p>
      <p>The Baden-Wurttemberg State Archives in Germany recognized this issue
and opened up their data to the Coding da Vinci initiative, the rst German
open cultural heritage hackathon. The initiative organizes several hack events a
year, bringing together GLAM institutions as data providers and computer
scientists, designers, digital humanists to develop creative and interesting
applications on the foundation of these cultural heritage data. The Baden-Wurttemberg
State Archives published a dataset about the Stuttgart State Theatres containing</p>
      <p>XML
(EAD-DDB)</p>
      <p>XML2RDF</p>
      <p>RDF
EAD2RDF RDF</p>
      <p>VIRTUOSO</p>
      <p>EXPLORATION:
LINKsElDodS.TfiAzG-kEaGrlRsAruPhHeV.dIeEWER</p>
      <p>SPARQL ENDPOINT</p>
      <p>
        VIKUS VIEWER
7.000 historical black and white photographs along with EAD-XML metadata
to be 'hacked' by the creatives. The photographs and metadata cover the
period from the 1890s to the 1940s. Especially in Germany, this time period was
marked by social and political upheavals in which democracy, freedom of speech
and creativity in particular were challenged. For this reason, the data provided
are still of enormous relevance today. For instance, the data reveal which theater
performances were allowed to be played in these di cult times and how certain
characters were displayed [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. In order to consume and optimally share these
data, speci c requirements must be met. These include in particular
interoperability, availability and comprehensibility on di erent levels. Linked Data based
knowledge graphs (KG) have established as practical means to formally encode,
integrate and share data.
      </p>
      <p>In this demo paper Linked Stage Graph6 is presented, a KG developed during
the Coding da Vinci Sud 2019 hackathon7 on the foundation of the
aforementioned archival data about the Stuttgart State Theaters. The goal of Linked Stage
Graph is to enable researchers as well as the general public to access, analyze and
explore the data in intuitive, interesting and useful ways. Along with the KG, the
presented prototype demo contributes a publicly available SPARQL endpoint8
to enable sophisticated queries for expert users, the extraction and linking of
named entities mentioned in the metadata to the Wikidata KG and the German
Integrated Authority File (GND)9, a timeline interface for data exploration and
lessons learned. As an additional feature, all 7.000 black and white photographs
were colorized using open source tools based on machine learning (ML). All code
of this demo is published and freely available on GitHub10.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Linked Stage Graph</title>
      <p>This section presents the main contributions of the demo, Linked Stage Graph.
6 http://slod.fiz-karlsruhe.de/
7 https://codingdavinci.de/events/sued/
8 http://slod.fiz-karlsruhe.de/sparql
9 https://www.dnb.de/EN/Standardisierung/GND/gnd_node.html
10 https://github.com/ISE-FIZKarlsruhe/LinkedStageGraph</p>
      <sec id="sec-2-1">
        <title>Dataset</title>
        <p>
          The project is based on an archival fonds of around 7.000 historical photographs
from the Stuttgart State Theaters. The photographs depict scenes and characters
from a wide range of productions from opera to childrens theater dating from the
1890s to the 1940s. In 2009, the Ludwigsburg State Archives took the fonds into
their custody. It consisted of 572 sleeves containing prints, photographic plates,
nitrate lms, photographic negatives and positive images. The archival
description captures (where possible) the title and author of the play, the directors,
choreographers and designers of each production. The provided dataset consists
of JPEGs and an EAD-XML le (cf. Fig.1 left). The Encoded Archival
Description11 (EAD) is a documentary XML standard for the description of archival
nding aids maintained by the Technical Subcommittee for Encoded Archival
Description of the Society of American Archivists together with the Library of
Congress. For users unfamiliar with an archives content structure it is di cult
interacting with EAD encoded nding aids, because often the archive's hierarchy
has to be navigated to extract meaningful information [
          <xref ref-type="bibr" rid="ref1 ref2">1,2</xref>
          ]. Also, it has been
widely recognized that EAD complicates user interaction with the data because
operations like accessing a speci c item on-the- y are either impossible or
ine cient. Automated processing is another issue since the degree of freedom for
expressing information within EAD is too high [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. This attempts to overcome
these shortcomings by transforming the EAD-XML into an RDF representation.
2.2
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Knowledge Graph</title>
        <p>Fig. 1 presents the work ow. Before further processing the input XML le, some
adaptions were necessary like adapting the XML name spaces, adding
repository code and language code attributes to the EAD unitid XML-tags and
replacing the lb tag with XML entity &amp;#10; . The actual XML to RDF
conversion was performed with two di erent approaches. First, the generic ReDeFer
XML2RDF12 converter was used. The second approach applied the EAD2RDF13
XSLT stylesheet to transform the provided XML to RDF. As expected, both
methods produced di erent results. While the EAD2RDF stylesheet result were
incomplete (e.g. did not manage to transform the information about the image
les), the XML2RDF converter produced a vast amount of blank nodes, which
are di cult to query and to navigate, but, the results were more complete. Both
methods created di erent IRIs to identify the actual subject of interest: the
archival unit. While XML2RDF preferred the archival identi er14 the EAD2RDF
transformation created the IRIs from the EAD unitid15. From a computer science
11 https://www.loc.gov/ead/
12 http://rhizomik.net/html/redefer/
13 http://data.archiveshub.ac.uk/ead2rdf/
14 E.g. http://slod.fiz-karlsruhe.de/labw-2-2599382
15 E.g. from 'Abt. Staatsarchiv Ludwigsburg, E 18 III Nr 6' http://slod.
fiz-karlsruhe.de/id/archivalresource/abt.staatsarchivludwigsburg,
e18iiibu161
perspective the rst type of IRI was considered more stable, and that's why it
was chosen as identi er. The results were merged through mapping the archival
unit titles and the archival identi ers. Finally, the unwanted IRIs were removed.
Many literals in the dataset contained unstructured information, i.e. the titles
included also a play's author name and the abstracts contained further
information about involved persons and roles. To extract this information a script
was created. This also involved to de ne the vocabulary to model the persons
types of contribution and roles. The aim was to reuse existing vocabularies as
best as possible. Due to the clear de nition of the domain the ambiguity in the
data was rather low. This enabled to map plays, person, and location names to
Wikidata and GND very quickly. Therefore, a dictionary of potentially relevant
resources form the vocabularies was extracted and an exact string matching was
performed. Finally, all information was deployed to an instance of the Virtuoso
triple store also providing a SPARQL endpoint.
2.3</p>
      </sec>
      <sec id="sec-2-3">
        <title>Exploration</title>
        <p>
          A variety of visual means was implemented and utilized to explore the archive
data. To bring the historical black and white images to life, they were
automatically colorized using an open source tool based on ML [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. To oversee the RDF
data in a table view, an instance of the open source LodView RDF viewer was
adapted and deployed [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. The Linked Stage Graph Viewer 16 was implemented
for this use case. The viewer is shown in Fig 2. It presents a timeline
visualization with the goal to let the user explore the rich and detailed images in a
more prominent way without too complex means of interaction to reduce the
technical barriers of engaging with the content. The user can scroll through the
images with an overview of the timeline on the right 1 . One large representative
image for each performance is shown 2 with further thumbnails on the bottom
3 . Swiping left or right reveals further plays which took place during the same
year 4 . Clicking on images will direct the user to the Lodview interface and
reveal all data available for the play. Next to the implemented Linked Stage
Graph Viewer, the Vikus Viewer 17 was utilized and connected to the dataset as
well. The viewer was previously developed by [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] and enables intuitive content
exploration in a timeline view as well as search and content clustering. During
the demo session, all described interfaces can be used to explore the dataset.
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Conclusion</title>
      <p>In this paper Linked Stage Graph is presented, a KG based on historical data
about the Stuttgart State Theater. Next to the KG, a SPARQL endpoint was
released, named entities mentioned in the metadata were extracted and linked to
existing KGs and a user interface was developed. The demo was created during
16 http://slod.fiz-karlsruhe.de/#Viewer
17 http://slod.fiz-karlsruhe.de/vikus/
the Coding da Vinci Sud hackathon and was awarded the prize for the "most
useful" application. Even though the presented demo implements a use case on
theater data only, it is generalizable to a broad range of archive domains since
the EAD-XML format is widely used in archives. Future work will involve the
improvement of the underlying ontology, a more exhaustive entity linking and
further means of visual exploration beyond timelines.</p>
      <p>Acknowledgement. We would like to thank Coding da Vinci for connecting
cultural institutions with creatives to develop innovative applications.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Ferro</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Silvello</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          :
          <article-title>From users to systems: Identifying and overcoming barriers to e ciently access archival data</article-title>
          .
          <source>In: ACHS@ JCDL</source>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Freund</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toms</surname>
            ,
            <given-names>E.G.</given-names>
          </string-name>
          :
          <article-title>Interacting with archival nding aids</article-title>
          .
          <source>Journal of the Association for Information Science and Technology</source>
          <volume>67</volume>
          (
          <issue>4</issue>
          ),
          <volume>994</volume>
          {
          <fpage>1008</fpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Glinka</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          , Dork, M.:
          <article-title>Museum im display. visualisierung kultureller sammlungen (vikus)</article-title>
          .
          <source>Konferenzband zur 22. Berliner Veranstaltung der internationalen EVASerie: Electronic Media and Visual Arts</source>
          <year>2015</year>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Halbach</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Judenrollen: Darstellungsformen im europaischen Theater von der Restauration bis zur Zwischenkriegszeit</article-title>
          , vol.
          <volume>70</volume>
          .
          <string-name>
            <surname>Walter de Gruyter</surname>
          </string-name>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Hildebrandt</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pohlmann</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Omran</surname>
          </string-name>
          , H.:
          <article-title>Lodview: a computer program for the graphical evaluation of lod score results in exclusion mapping of human disease genes</article-title>
          .
          <source>Computers and biomedical research 26(6)</source>
          ,
          <volume>592</volume>
          {
          <fpage>599</fpage>
          (
          <year>1993</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Prom</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rishel</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schwartz</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fox</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>A uni ed platform for archival description and access</article-title>
          .
          <source>In: Proceedings of the 7th ACM/IEEE Joint Conference on Digital Libraries</source>
          ,
          <string-name>
            <surname>JCDL</surname>
          </string-name>
          <year>2007</year>
          . pp.
          <volume>157</volume>
          {
          <issue>166</issue>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7. Zhang, R.,
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>J.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Isola</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Geng</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>A.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Efros</surname>
            ,
            <given-names>A.A.</given-names>
          </string-name>
          :
          <article-title>Realtime user-guided image colorization with learned deep priors</article-title>
          .
          <source>ACM Transactions on Graphics (TOG) 9</source>
          (
          <issue>4</issue>
          ) (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>