      Linked Data for Digital Humanities Scholars and
    Researchers: “Rainis and Aspazija” (RunA) Collection
           Uldis Bojārs[0000-0001-7444-565X], Anita Rašmane and Anita Goldberga

            National Library of Latvia, Mukusalas iela 3, Riga, LV-1423, Latvia
        [uldis.bojars, anita.rasmane, anita.goldberga]@lnb.lv

        Abstract. This paper presents the Linked Digital Collection “Rainis and
        Aspazija” that demonstrates the use of Linked Data in Digital Humanities1. This
        collection offers interlinked digital objects and data from several memory insti-
        tutions and private repositories related to two Latvian poets of the period of Na-
        tional Awakening – Rainis and Aspazija. The paper also describes the semantic
        annotation tool developed for cultural heritage needs that was used to create an-
        notations in this collection. Digital object annotations are the source of links be-
        tween collection’s digital objects and the entities mentioned in these annotations.
        Information about collection’s objects and entities is published as Linked Open
        Data facilitating reuse of this information.

        Keywords: Cultural Heritage, Linked Data, Digital Collection, Text Annota-

1       Introduction

Metadata of objects in digital collections provide means for finding the resources
mainly by their external description, pre-categorization and physical parameters. Only
a small part of metadata values deals with the content of resource. Scholars and re-
searchers in Digital Humanities have to deal with large amount of texts and collections
of documents, whose content has to be analyzed and processed with the intent of ex-
tracting precise meaning of mentioned entities and their interconnections. Such a depth
of content description has to be and can be achieved to create a fruitful research and
scholarship environment by the means of enriching written texts with annotations –
marked text fragments, linked to identified entities, who in turn manifest themselves as
additional metadata for a resource and can link to additional internal and external data
   In 2016, the National Library of Latvia (NLL) together with the National Archives
of Latvia, the Institute of Literature, Folklore and Art of the University of Latvia, the
Association of Memorial Museums, and the Literature and Music Museum published
“Rainis un Aspazija” (RunA) – the first digital cross-sectoral cultural heritage pilot
collection in Linked Data form in Latvia. RunA highlights the NLL`s efforts in devel-
oping new knowledge bases for memory institutions and researchers. During 2018-

1 https://runa.lnb.lv/en/

2019, a special semantic annotation tool and its entity datastore was developed by the
NLL to enhance RunA document annotations [2].
   This paper introduces a major new release of the collection that is substantially dif-
ferent from the initial pilot project described in [4]. In particular, it has a new look-and-
feel, additional digital objects, and annotations in the collection are created and man-
aged using a custom-built tool for annotating cultural heritage content. An important
difference from the initial release is the entity database that is part of the annotation
tool and that allows users to collect and manage information about the entities men-
tioned in annotations.

2      Semantic Annotation Tool

Content for the “Rainis and Aspazija” collection is prepared using a custom semantic
annotation tool. This tool allows users to annotate textual content, mark the mentions
of important entities and provide additional information about these entities using the
entity database that is a part of this tool [2]. Users of the annotation tool include domain
experts and digital humanities students.
    Cultural heritage content and especially historical documents may be particularly
difficult for automatic entity recognition and linking because the relevant tools need to
know the specific context of the documents (e.g. personal correspondence and people
involved in it) and entities that are likely to be mentioned in these documents [3]. In
order to create high quality annotations, the annotation tool and the RunA annotation
process use manual annotation. This is due to the semantically and lexically free and
unstructured use of words in personal written texts where many entities are not named
directly, but are implied by pronouns (“he”, “she”) or generic names (“city”, “sister”).
In many cases, the person annotating the document is able to identify entities from con-
text. In other cases, this may be done after examining the context of a group of docu-
ments. Another issue is the need to disambiguate between different meanings of the
same text fragment. As an example, the annotated compendium to one of Aspazija’s
works in RunA uses the word “Aspazija” to refer to 4 different entities (person
Aspazija, her work “Aspazija”, etc.) which should be correctly identified when anno-
tating this document.
    The annotation tool supports three core types of annotations - simple annotations
that may link to named entities, structural annotations that mark up portions of the doc-
ument that have a special meaning within the context of the document (e.g. direct cita-
tion of another published material) and composite annotations for more complex use
cases (e.g., for representing an event described in a document with mentions of place,
time and participants, all marked and identified in their own annotations).

Fig. 1. Text annotation tool showing an annotation about a person (Aspazija).

Figure 1 is a screenshot of the annotation tool showing annotations in the English com-
pendium to Rainis’ play “The Golden Horse”. The left side of the screen contains text
with annotations while the right side shows information about the selected annotation
(Aspazija) which contains a reference to the entity database record about Aspazija. In-
formation about the annotation includes its type (simple annotation), status (com-
pleted), annotation class (person) and information about the entity associated with the
annotation. Annotations may also contain user comments.
   New annotations can be added by highlighting text fragments and entering infor-
mation about the annotation. As a part of the workflow, users may choose an existing
entity or create a new entity that the annotation will refer to.
   Information about the entities referenced from annotations is maintained in a dedi-
cated entity database that supports links between entities and can point to additional
information about these entities (e.g., to Linked Data resources such as Wikidata2). The
database provides for storing and reusing data extracted from individual annotations
and those added by researchers. This allows experts to build a knowledge base about
the entities referenced from annotations while annotating documents. Figure 2 shows
an example of entity database information about a person – Friedrich Reinhold
   The entity information can evolve as the document annotation task progresses as it
is possible to enhance the entity data later on. For example, a user may create an entry
for an entity that needs further research and leave comments about what is known about
the entity and what is not. The entity record can later be extended with additional in-
formation (e.g. identifiers for this entity in other authoritative data sources) when it
becomes available.

2 https://www.wikidata.org/

Fig. 2. Annotator’s entity database information about a person.

The annotation tool can export annotated documents as rich web applications that con-
tain the document along with annotations and information about the entities that anno-
tations refer to. Exported annotated documents are used by the “Rainis and Aspazija”
collection for embedding into collection object pages.
    Further information about the annotation tool and related work can be found in [2].

3       Linked Digital Collection “Rainis and Aspazija”

“Rainis and Aspazija” is a digital collection that contains diverse, multi-format content
about two Latvian poets of the period of National Awakening – Rainis (1865-1929) and
Aspazija (1865-1943) – supplied by multiple cultural heritage organizations. When cre-
ating the collection, we aimed at including a wide range of content types and digital
objects. The collection includes3:
     • first editions of Rainis’ and Aspazija’s literary works along with their anno-
         tated compendiums (abstracts) in English and Latvian (85 works and 158 an-
         notated compendiums);
     • personal correspondence with annotations (395);
     • archival documents (23);
     • photos (516), audio and video recordings (37);
     • posters (22), presentations (7) and cartoons (32).

3 A summary of content types in the collection: https://runa.lnb.lv/en/par-kolekciju/

This is the second major release of the collection. Compared with the initial pilot project
[4] this release:
      • builds on a new annotation system that’s integrated into the collection;
      • contains literary work compendiums in English and Latvian;
      • has additional content (extensive annotated biographies of both poets, more
           than 100 photos from the National Archive of Latvia, cartoons etc.).
      • the user interface has English localization;
      • has a new look and feel (improved search, responsive web design, etc.);
      • provides interface for mobile devices.

The collection contains detailed biography pages for Rainis 4 and Aspazija5 with sum-
maries of collection objects (by object type) related to each person. The biography page
for Aspazija is shown in Figure 3.

                   Fig. 3. Aspazija’s biography page in the digital collection.

In terms of links in the collection, the most interesting types of content are (1) personal
correspondence that has been meticulously transcribed and annotated, and (2) annotated
compendiums (abstracts) to poets’ literary works. The text of both of these content
types was annotated with mentions of named entities using the annotation tool de-
scribed above. Information about these entities is a part of the collection and is shown
as collection’s entity pages. The collection allows users to search for both digital objects
and entities.

4 Rainis: https://runa.lnb.lv/en/110023/
5 Aspazija: https://runa.lnb.lv/en/142651/

   Figure 4 shows a digital object page containing a compendium to Rainis’ play “The
Golden Horse”6 (in Latvian: Zelta zirgs). The top part of the page contains object
metadata including links to other objects (in this case: the first edition of this work).
The bottom part contains annotated digital object text exported from the annotation tool
and a list of entities that appear in these annotations. By clicking the links users can
view collection’s pages about these entities.
   Annotations are highlighted in different colors based on the type of the entity they
refer to. Most of them are simple annotations that identify the entity represented by the
annotated text fragment. Collection’s entity pages contain links back to the documents
that mention this entity. As a result, the collection becomes a Linked Digital Collection
that contains a network of objects (annotated textual content) and entities linked to one
another. Users may view the network of links by clicking the “Data network” button.

                   Fig. 4. Annotated document view in the digital collection.

Entity pages are part of the collection and contain information about the entity along
with links to other entities and to additional resources about this entity on the web.
These links may refer to web pages or to Linked Data URI identifiers. Entity pages also
show information imported from Wikipedia – abstracts and images representing the

6 https://runa.lnb.lv/en/objects/729593/

   Person entity pages contain links to all documents in which a person is mentioned
directly (“Doriņa”) or indirectly (“beloved sister”), provided these documents contain
annotations pointing to the entity. This creates a group of related texts that are imme-
diately available to researchers who otherwise would need to examine all collection
documents. Place names which depend on language and vary over time (“Pēterpils”,
“Петроград”, “Saint Petersburg”) are another example of the difficulty that awaits re-
searchers when examining cultural heritage documents. This difficulty is resolved by
adding place name annotations.
   The collection system also provides a map7 and a timeline view8. The map shows
the places that were important in lives of both poets. Information for the map is col-
lected from documents annotated with the mentions of placenames. The timeline view
shows events in poets’ lives in the context of important worldwide events.
   An obstacle that needed to be overcome when creating this collection was that dif-
ferent participants (libraries, archives, museums) use different standards and data for-
mats to represent their information. Currently, we cannot directly import metadata from
museums and archives. The metadata is first converted to a spreadsheet instead, for
import into NLL’s digital object management system (DOM). Perhaps an agreement
between memory institutions on a single data model for named entities and a common
cultural heritage entity register in which the various forms of entities are controlled
could solve this problem in future.
   The popularity of RunA has grown 42% from ~28 thousand page visits in 2019 to
~40 thousand visits in 2020. The information in the RunA collection is still being ex-
panded with additional correspondence and excerpts of Rainis’ diaries. Students from
the Faculty of Humanities, University of Latvia are involved in the annotation of addi-
tional poets’ correspondence. Students and researchers can use RunA as a digital
resource and discover previously unknown relationships between the collection’s ob-
jects and entities. This knowledge base is also being integrated into the education pro-
cess of study courses at the Faculty of Humanities, University of Latvia.

4       Linked Data

Machine-readable information about all RunA objects and entities is published accord-
ing to Linked Data principles in Turtle RDF and RDF/XML format [1]: collection’s
resources (objects and entities) have HTTP URI identifiers and the system responds to
requests for these URIs by sending back structured RDF data which contain links to
other resources. Linked Data can also be retrieved by appending a corresponding ex-
tension to the URI:
     • .ttl for the Turtle RDF format (e.g. https://runa.lnb.lv/729593.ttl );
     • .rdf for the RDF/XML format (e.g. https://runa.lnb.lv/729593.rdf ).

7 https://runa.lnb.lv/en/map/
8 https://runa.lnb.lv/en/darbi-un-notikumi/

Links to RDF data are also displayed on collection’s webpages. In order to facilitate
the harvesting of collection’s machine-readable data, the system provides an XML
sitemap with URIs of its objects and entities9.
   An example of collection’s RDF data can be found in Appendix 1. It describes an
annotated compendium to Rainis’ play “The Golden Horse” in Turtle RDF format. The
vocabularies used for representing collection’s resources in RDF are Dublin Core10,
FOAF11, Bibo12 and Schema.org. Notable properties in RDF data of the collection in-
     • bibo:annotates property represents links between annotated documents and
          the original digital objects (e.g. handwritten letters);
     • dct:hasPart property points to the files contained in the digital object (e.g. a
          scanned image of a letter);
     • dct:isPartOf and schema:isPartOf properties point to NLL’s digital object
          management system (DOM) collection that the resource is a part of (e.g. a
          digital object collection about Aspazija);
     • owl:sameAs and schema:sameAs represent links to the same resource in DOM;
     • schema:mentions property represents links between objects and entities.
Visitors of the collection can benefit from links between its objects and entities that
provide a new way for exploring the collection. For example, they may want to work
with a subset of documents that mention a particular entity. Advanced users can make
use of collection’s Linked Open Data by automatically collecting this data and using it
for further analysis.
   Related work includes Amsterdam Museum Linked Data project [5] which publishes
objects’ metadata as Linked Data but, unlike RunA, does not annotate the content of
digital objects. Another example is a Singapore National Library Board project [6] in
which connections between entities are automatically extracted, disambiguated and
links established.

5       Conclusion

In this paper we described the Linked Digital Collection “Rainis and Aspazija” and the
text annotation tool used to enrich the collection with links between its objects and
entities. These links are added to the collection by annotating textual content of the
collection’s objects. The resulting knowledge base is available as open data and is being
integrated into the education process at the University of Latvia.
   The annotation tool was developed based on NLL’s needs for the annotation of cul-
tural heritage information. An integral part of the tool is its entity database where users
can add and maintain information about the entities mentioned in annotations. Entity
information may contain external links to webpages about the entity and Linked Data
identifiers (URIs) for information about this entity in other Linked Data resources.

9 https://runa.lnb.lv/sitemap.xml
10 https://www.dublincore.org/specifications/dublin-core/dcmi-terms/
11 http://xmlns.com/foaf/spec/
12 http://www.bibliontology.com/

   RunA allows researchers to explore the digital collection in new ways by following
the links between collection’s objects and entities. This collection is published as
Linked Open Data, creating an opportunity for reuse of this cultural heritage knowledge


This research was supported by the Latvian Council of Science Project Nr. lzp-2019/1-
0365 “Latvian Memory Institution Data in the Digital Space: Connecting Cultural

Appendix 1 – Linked Data of the Compendium of “The Golden Horse” by Rainis

  a bibo:Document ;
  dct:isPartOf  ;
  dct:isPartOf  ;
  dct:title “Rainis’ play ‘The Golden Horse’ (‘Zelta zirgs’) (1909).
Compendium”@en ;
  dc:type “Teksts” ;
  dct:creator  ;
  dct:language  ;
  dct:subject  ;
  foaf:depiction  ;
  dct:hasPart  ;
  dct:date “2015” ;
  dct:source  ;
  owl:sameAs  .

  a schema:CreativeWork ;
  schema:isPartOf  ;
  schema:isPartOf  ;
  schema:author  ;
  schema:description “Rainis’ play ‘The Golden Horse’ (‘Zelta zirgs’)
(1909). Compendium”@en ;
  schema:thumbnailUrl  ;
  schema:datePublished “2015” ;
  schema:sameAs  .

  bibo:annotates  .

  schema:mentions  ;
  schema:mentions  ;
# ...
  schema:mentions  .