<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Case study: towards a linked digital collection of Latvian cultural heritage</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Uldis Bojars</string-name>
          <email>uldis.bojars@lnb.lv</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>National Library of Latvia</institution>
          ,
          <addr-line>Mukusalas iela 3, Riga, LV-1423</addr-line>
          ,
          <country country="LV">Latvia</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2016</year>
      </pub-date>
      <fpage>21</fpage>
      <lpage>26</lpage>
      <abstract>
        <p>The paper describes a linked digital collection developed at the National Library of Latvia. It highlights the interlinked nature of digital objects and will help users discover new information by exploring links between the objects. The object linkage is enhanced by annotating contents of selected types of objects resulting in a network of links between annotated objects and entities mentioned in their annotations. The collection is aimed both at the general public and researchers, and will provide a Linked Data interface and publish open datasets that researchers may nd useful. The paper also describes our experience in developing the system with the limited resources available.</p>
      </abstract>
      <kwd-group>
        <kwd>cultural heritage</kwd>
        <kwd>digital collections</kwd>
        <kwd>Linked Data</kwd>
        <kwd>Rainis</kwd>
        <kwd>Aspazija</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>\Rainis and Aspazija 150" digital collection
This paper describes a case study based on the \Rainis and Aspazija 150" Linked
Digital Collection pilot project, currently in development at the National Library
of Latvia (NLL).1</p>
      <p>The collection is devoted to the 150th anniversary of famous Latvian poets
and politicians Rainis and Aspazija (in 2015). Its goal is to collect a wide variety
of types of digital objects and to present these two great personalities in a wider
context. The pilot project provides the possibility for users to explore annotated
works of both poets, a small part of their correspondence and various related
materials (documents, photos, posters, audio and video les). The poets' mutual
correspondence spanned 35 years, covering essential stages of Latvian history and
culture development, and has been included in the Latvian national register of
the UNESCO \Memory of the World" programme.</p>
      <p>The pilot project is not aimed at quantity but rather at variety and quality
(collecting di erent kinds of objects from multiple participating organizations
and describing and interlinking these objects as accurately as possible). The
current collection consists of 500 digital objects of various types: 84 literary
works, 158 letters along with their annotated transcriptions, 120 photos and
various archival documents, posters, audio and video recordings.
2.1</p>
      <sec id="sec-1-1">
        <title>Annotations</title>
        <p>The collection's objects may be linked to other related objects. Especially
interesting in this regard are annotations that describe letters and some of the
literary works in the \Rainis and Aspazija 150" collection. The letters that are
a part of the collection had been previously transcribed and commented by
experts helping us to disambiguate relevant entities mentioned in annotations and
to link them to other information about these entities. Transcriptions of letters
were annotated manually in order to ensure that the annotations are as precise
as possible.</p>
        <p>As a result of this annotation process new entities (such as people,
organizations and events) and links to these entities were added to the collection. Where
possible we tried to nd existing digital objects that are related to or represent
things mentioned in these annotations and to add these objects to the collection.
For example, some letters mentioned a portrait of Aspazija to be published on
the cover of a calendar. The expert annotating the letter found the portrait in
a digitised version of the calendar and added an annotation linking to it.</p>
        <p>Figure 1 shows the collection's page about a letter (from Rainis to Aspazija)
linked to its annotated transcription. Figure 2 shows a fragment of an
annotated letter that links to various entities mentioned in the letter: a concept for
Zeus (greek deity), locations (Rainis' new and old work cabinet) and a person
1 At the time of writing this collection was work-in-progress. It will be available at
http://runa.lnb.lv from June 2016.
(Otto von Bismarck). The collection also has webpages describing these entities,
including links back to other objects that reference (e.g. mention) them.</p>
        <p>The entities mentioned in annotations are associated with URIs
representing other resources related to these entities. One of the challenges in developing
this system was how to maintain information about entities mentioned in
annotations. Due to the limitations described in later sections we chose a simple
approach where annotations are stored as links in HTML les and the
information about entities is maintained in a table containing object type and related
URIs. We hope to replace it with a better system for maintaining information
about the entities of interest such as BBC Things - a Linked Data system that
maintains information about all the things BBC systems may need to refer to.2
2 http://www.bbc.co.uk/things/about</p>
      </sec>
      <sec id="sec-1-2">
        <title>Use Cases</title>
        <p>
          This section outlines some potential use cases for the collection. Regular visitors
(such as the general public) can use the collection just like any other collection
with the added bene t that additional links between objects would make it
possible to \travel" through the collection and discover new content. The collection
also provides visual tools - a timeline of the poets' works and a visualization
of links between objects. The experience of Centre Pompidou shows that users
may perceive a collection that consists of various types of objects interlinked to
one another as a completely new way of exploring data: \... users who were just
browsing the site out of curiosity liked the fact that they would get lost and
discover unexpected resources" [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ].
        </p>
        <p>Advanced visitors (researchers, students) may engage in a more targeted
exploration, looking at particular kinds of objects and links. They may also be
interested in using the collection as raw data for further analysis (e.g. network
analysis of the link graph or the content analysis of digital objects). These
researchers might make use of all kinds of data and interfaces that the collection
provides (such as a Linked Data interface, open data dumps, queries over the
collection's data, and visualizations).</p>
        <p>During project demonstrations users were especially interested in
visualizations that show a network of links around the selected object. Regular users
viewed it as a novel way for exploring the collection while researchers were quick
to focus on some links in particular and to provide a larger context behind
these links (e.g. that a literary work has a link to a person because she was an
inspiration for the main character in this work).</p>
        <p>
          Computer applications (and their developers) may access the collection in
order to present or analyze its information in novel ways. They may use all the
datasets and programming interfaces that the collection provides (Linked Data
interface, APIs, open datasets). Open data catalogs may collect information
about available open datasets. In order to do so they will need access to
machinereadable information about open datasets provided by the collection which can
be published using the Data Catalog (DCAT) vocabulary [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ].
        </p>
        <p>Types of data that could be published by a digital collection such as \Rainis
and Aspazija 150" include the collection's object metadata including links
between objects (e.g. published as Linked Data), an object link graph (for use in
network analysis tools) and the contents of the collection's objects.
3</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Describing the Collection as Linked Data</title>
      <p>The collection will be published as Linked Data. The main question here is not
how to publish the data but how to represent the information contained in the
collection as RDF. There are two main approaches: (a) developing your own
RDF vocabulary; or (b) reusing an existing RDF vocabulary or vocabularies.</p>
      <p>
        Both approaches are valid { the Swedish Union Catalogue (LIBRIS) reuses
existing vocabularies while datos.bne.es and Centre Pompidou have developed
their own vocabularies [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ]. The bene ts of developing your own vocabulary
are that it can more precisely and fully re ect the information that the system
contains (especially important if the system stores data as RDF). The bene ts
of reusing existing vocabularies are that other applications might make direct
use of the information published by the system.
      </p>
      <p>We decided to reuse existing vocabularies where possible and to express the
collection in RDF in two di erent ways: (1) using a combination of BIBO, FOAF
and Dublin Core similarly as done in LIBRIS; and (2) using the Schema.org
vocabulary. This two-pronged approach was chosen because BIBO, FOAF and
Dublin Core o er more precise properties for expressing our LDC's information
while Schema.org potentially o ers more reach because it is used by major web
search engines.
&lt;http://example.org/letter/letter_01_URI&gt;
a bibo:Letter ;
dc:creator &lt;http://example.org/person/Rainis&gt; ;
bibo:recipient &lt;http://example.org/person/Aspazija&gt; ;
bibo:place &lt;http://example.org/location/Riga&gt; ;
dc:description "a letter from Rainis to Aspazija. Riga, 1894.02.04"@en ;
dc:date "1894-02-04"^^xsd:date .
# Links to entities mentioned in the letter
&lt;http://example.org/letter/letter_01_URI&gt;
dc:references &lt;http://example.org/person/Aspazija&gt; ,</p>
      <p>&lt;http://example.org/work/LNC04-000331919&gt; .</p>
      <p>The code listing shows an RDF data example describing a letter and its links
to the entities mentioned in this latter. The example uses BIBO, FOAF and
Dublin Core vocabularies and Turtle RDF serialization. Data about the entities
mentioned in annotations contain links to other relevant Linked Data sources
such as VIAF making the collection a part of a larger web of Linked Data.
4</p>
    </sec>
    <sec id="sec-3">
      <title>Development: Constraints of the Pilot Project</title>
      <p>What sets this case study apart from more research-oriented projects is that
it had to deal with serious \real-life" constraints { the project was developed
using library's internal resources (except for the data collection activity which
involved other partner institutions) and did not have a separate budget or
research partners that would help develop a custom state-of-the-art solution.</p>
      <p>While these constraints seriously limited what could be developed in the
pilot project and what technologies could be used, they also gave insights into
the obstacles to the development of novel application with limited resources and
into how developers not familiar with the Semantic Web perceive Linked Data
principles and technologies. We hope that these observations will be useful to
other organizations that may be in a similar situation.</p>
      <p>The developers, not being familiar with the Semantic Web, prefer the
technologies that they understand and are comfortable with (e.g. MS SQL, .Net and
C#). On the other hand, they know the library's existing systems and are good
at integrating the pilot project with these systems. The system was developed
on a relational database rather than RDF store. It was an acceptable choice
because the data we are working with have a well-de ned structure and come
from existing library systems. The Linked Data interface is being built on top
of this system.</p>
      <p>We observed that developers found some Linked Data concepts easier to
understand than others: that it is important to link objects together, to reuse
the information available on other systems and, to some extent, the importance
of URIs as identi ers for things. They might not consider the RDF data model
an integral part of the system and view Linked Data as just one of the interfaces
through which the system may publish its data.</p>
      <p>A key challenge to the adoption of Semantic Web technologies in cultural
heritage organizations that usually do not have su cient research and development
capacity is how can developers start using Semantic Web technologies when they
spend most of their time on the existing legacy systems and technologies.
5</p>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>The \Rainis and Aspazija 150" digital collection introduced in this paper brings
together various types of digital objects related to the topic of the collection
and links these objects to one another. Of particular interest are the letters
contained in the collection, transcribed and annotated with links to entities
mentioned there, thus creating a new layer of links between collection's objects.
The information about collection's objects and links between them will be made
available as Linked Data.</p>
      <p>The collection is about well-known and active Latvian cultural personalities
and politicians and its contents will be interesting both to the general public and
to researchers who previously did not have such a collection readily available.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Bermes</surname>
          </string-name>
          , E.:
          <article-title>Following the User's Flow in the Digital Pompidou</article-title>
          .
          <source>Linked Data and User Interaction</source>
          ,
          <volume>162</volume>
          ,
          <issue>19</issue>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Berners-Lee</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Linked Data - Design Issues</article-title>
          .
          <source>W3C</source>
          (
          <year>2006</year>
          ).
          <source>Retrieved March 4</source>
          ,
          <year>2016</year>
          , https://www.w3.org/DesignIssues/LinkedData.html
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Maali</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Erickson</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Archer</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Data Catalog Vocabulary (DCAT)</article-title>
          .
          <source>W3C Recommendation</source>
          (
          <year>2014</year>
          ), https://www.w3.org/TR/vocab-dcat/
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Malmsten</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Making a library catalogue part of the Semantic Web</article-title>
          .
          <source>Proceedings of the 2008 International Conference on Dublin Core and Metadata Applications</source>
          . Dublin Core Metadata Initiative (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Vila-Suero</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Villazon-Terrazas</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gomez-Perez</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <article-title>: datos.bne.es: A library linked dataset</article-title>
          .
          <source>Semantic Web</source>
          ,
          <volume>4</volume>
          (
          <issue>3</issue>
          ),
          <fpage>307</fpage>
          -
          <lpage>313</lpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>