<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards an Ontology for Describing Archival Resources</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Laura Pandolfo</string-name>
          <email>laura.pandolfo@uniss.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Luca Pulina</string-name>
          <email>lpulina@uniss.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marek Zielinski</string-name>
          <email>MZielinski@pilsudski.org</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>POLCOMING, Universita di Sassari</institution>
          ,
          <addr-line>Viale Mancini n. 5</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Pilsudski Institute of America</institution>
          ,
          <addr-line>138 Greenpoint Avenue, Brooklyn, NY 11222</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2017</year>
      </pub-date>
      <fpage>111</fpage>
      <lpage>116</lpage>
      <abstract>
        <p>Several digital libraries and archives are emerging around the world due to the need to store, organize and make available on the Web a lot of resource collections. However, managing this information poses new challenges in order to overcome traditional data management and information browsing. Semantic Web technologies can improve digital libraries and archives by facilitating metadata storage and adding semantic capabilities, which increase the quality of the information retrieval process. In this paper we present arkivo, an ontology designed to model the archival description of historical document collections.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        The Web changed the way people can search and discover information
providing them the opportunity to have direct access to millions of documents easily.
Online repositories, such as digital libraries, support users' exploration of large
document collections and, as in the case of digital historical archives, also
facilitate access to original and rare documents. Recently, digital archives are facing
new challenges in order to overcome traditional data management and
information browsing. The Semantic Web (SW) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] technologies provide ways to address
these challenges by o ering valuable solutions to represent, organize, and retrieve
such kind of data. In particular, ontologies play a key role providing a common
shared vocabulary that can be used to describe domains, annotate documents
and promote interoperability and consistency between di erent sources [
        <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
        ].
      </p>
      <p>
        In the context of digital libraries and archives, some of the most used
metadata and ontologies include Dublin Core Metadata Initiative (DCMI) [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], CIDOC
Conceptual Reference Model (CRM) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], MAchine-Readable Cataloging (MARC),
Metadata Object Description Schema (MODS) [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], and Encoded Archival
Description (EAD) [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. However, none of these can exhaustively support both the
representation of the archival arrangement structure and the annotation of
historical data embedded within the documents { the importance of which has been
highlighted in, e.g., [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        To address these needs, in this paper we introduce arkivo1, an ontology
designed to accommodate the description of historical archival documents,
support1 Arkivo is the translation of \Archive" in Esperanto.
ing archive workers by encompassing both the hierarchical structure of archival
collections and rich metadata created during archive digitization, such as
historical elements. The aim of arkivo is not only to provide a reference schema
for publishing Linked Data [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] about historical archival documents, but also to
describe the historical elements contained in these documents, e.g., giving the
opportunity to represent useful relationships between people, places, and events.
In this paper, we also describe the usage of arkivo in the context of the
historical archive stored by the Jozef Pilsudski Institute of America, which houses
a rich collection of historical sources covering the period from the 1863 to the
present day.
      </p>
      <p>The paper is organized as follows. In Section 2, we brie y the arkivo
ontology and its design process, while in Section 3 we show the usage of arkivo in the
context of the digitized collections of the Jozef Pilsudski Institute of America.
We conclude the paper in Section 4 with some nal remarks and future work.
2</p>
    </sec>
    <sec id="sec-2">
      <title>The ARKIVO Ontology</title>
      <p>
        The ontology development process can be characterized by di erent strategies
and methodologies { see, e.g., [
        <xref ref-type="bibr" rid="ref16 ref7">16, 7</xref>
        ]. arkivo has been developed according to a
top-down strategy, which consists rst in identifying the most abstract concepts
of the domain and then in specializing the speci c concepts. In the following, we
report the main phases of the development process of arkivo, which have been
carried out with the support of the experts.
      </p>
      <p>Requirements Speci cation and Knowledge Acquisition. In this phase, we
considered di erent scenarios, use-cases and end-users, focusing on the archival
management practices and the most common methods used by archives for
storing and cataloging materials. Moreover, we analyzed the best practices used by
archive workers in the metadata collection process. This phase allowed us to
detect the main concepts useful to represent the domain of interest.
Conceptualization and Formalization. In the light of the knowledge gained in the
previous phase, we have drawn up a glossary of terms that identify the proper
terminology used in the archival domain. The aim of the conceptualization
resulting from this activity was intended to structure the domain knowledge, in terms
of concepts and relations, in order to meet the pre-established requirements. In
particular, we compute a taxonomy for describing the archival arrangement
levels, from the concept of collection, which can contain items or other collections
as fonds, to the concept of single item, which typically is the smallest indivisible
unit.</p>
      <p>
        Integration. Some of the concepts resulting from the conceptualization phase
can be represented by reusing existing standard metadata and vocabularies. For
this purpose, we integrated arkivo with the several core ontologies and
vocabularies. In details, DCMI, FOAF2, and schema.org [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] were used to model some
2 http://www.foaf-project.org/
general information related to documents, organizations, places and persons. We
also referred to BIBO3 ontology in order to have a detailed classi cation of
documents. In order to link a place name to its current geographical location, we
used Geonames4. Finally, we integrated LODE [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] ontology to model events and
their properties.
      </p>
      <p>
        Implementation. arkivo has been developed using the OWL2 language [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] with
the protege [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] editor. The ontology is composed of 43 classes, 24 object
properties, and 34 data properties. In the following, we pinpoint some of the main
classes and properties of arkivo ontology. Notice that we include the core
ontologies pre xes, namely dc for Dublin Core, foaf for FOAF, schema for
schema.org, and bibo for BIBO ontology. Finally, the empty pre x is used for
original classes and properties of arkivo.
bibo:Collection is the class that represents set of documents or collections.
      </p>
      <p>This class has several sub-classes, including :File and :Fonds. The former is
the class devoted to describe a le, namely an organized unit of items grouped
together, while the latter relates the whole of the records organically created
and/or accumulated by a particular person, family, or corporate body in the
course of that creator's activities and functions.
:Date is the class containing dates mentioned in an item.
foaf:Organization is used to describe an organization related to bibliographic
items or to events.
foaf:Person represents people related to a bibliographic item or to a speci c
event.
:Item represents the archival item, in other words the smallest intellectually
indivisible archival unit. This class contains several sub-classes, such as
bibo:Article, :Document and bibo:Letter.
dct:creator is the relationship that shows who has created a speci c item,
connecting individuals in :Item class to individuals in foaf:Agent class.
dc:created indicates the date when it was created an individual of the class</p>
      <p>Item.
schema:isPartOf indicates that an individual in the class :Item is part of a
collection, by linking that individual to another in the class bibo:Collection.
schema:mentions is useful to indicate that an instance of foaf:Person and/or
schema:Place is mentioned in an individual of the class bibo:Collection.
:isSectionOf connects instances of :File to instances of :Fonds.
:repository connects instances of foaf:Organization class to instances of
bibo:Collection class, in order to describe that an organization can be a
repository of collections or items.
arkivo ontology is licensed under a Creative Commons Attribution 3.0
Unported License and it can be downloaded at http://purl.org/arkivo. For more
details about the full list of classes and properties see also the documentation at
https://github.com/ArkivoTeam/ARKIVO.</p>
      <sec id="sec-2-1">
        <title>3 http://bibliontology.com/ 4 http://www.geonames.org/ontology</title>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Case Study: the Jozef Pilsudski Digital Archival</title>
    </sec>
    <sec id="sec-4">
      <title>Collections</title>
      <p>The Jozef Pilsudski Institute of America5 was established in 1943 in New York
City for the purpose of continuing the work of the Institute for Research of
Modern History of Poland established in Warsaw in 1923. The Polish State
was re-established in 1918 in the aftermath of the Great War and after several
regional wars and uprisings, the borders were settled in 1922. Soon after a group
of historians and o cers begun to travel around the country to collect archival
documentation. At the beginning of World War II, part of the archives were
evacuated and landed in Washington, eventually creating the seed of the Institute
archival collections, which grew in time by donations from politicians, o cers
and organizations of prewar Poland and Polish diaspora. Today, the Institute
has some 240 linear meters, namely 2 million pages, of archives covering mostly
the Polish, European and American history of late 19th and 20th century. The
collection includes documents, photographs, lms, posters, periodicals, books,
personal memoirs of diplomats, and political and military leaders, as well as
collection of paintings by Polish and European masters. For the last nine years,
the archival collections are being digitized, and gradually put online.</p>
      <p>The main objective of the historical research is to understand the past
through the study of historical sources, such as documents stored in archives.
In this context, researches are mainly interested in detecting facts (e.g.,
people, places, events) cited in the documents in order to analyze them, discover
relationships and draw inferences. arkivo ontology, unlike, e.g., EAD, provides
elements to represent both the hierarchical structure of archival documents and
the historical data expressed in them.</p>
      <p>As an example, in the following we report the description (in Turtle language)
of one of the document stored in the Jozef Pilsudski Institute archive, namely
the \Letter to comrades in London". Such document has been wrote by Pilsudski
in 1898, and it contains a mention of di erent people and places, as depicted in
Figure 1.
:LetterToComradesInLondon a bibo:Letter .
:A701.001.012 a :File .
:A701.001 a :Fonds .
:PilsudskiInstitute a foaf:Organization .
:PilsudskiJosef a foaf:Person .
:JedrzejowskiBoleslaw a foaf:Person .
:MalinowskiAleksander a foaf:Person .
:Sachalin a schema:Place .
:Bialystok a schema:Place .
:LetterToComradesInLondon schema:isPartOf :A701.001.012 .
:A701.001.012 schema:isSectionOf :A701.001 .
:A701.001 :repository :PilsudskiInstitute .
:LetterToComradesInLondon dc:creator :PilsudskiJosef .</p>
      <sec id="sec-4-1">
        <title>5 http://www.pilsudski.org/</title>
        <p>:LetterToComradesInLondon schema:mentions :JedrzejowskiBoleslaw .
:LetterToComradesInLondon schema:mentions :MalinowskiAleksander .
:LetterToComradesInLondon schema:mentions :Sachalin .
:LetterToComradesInLondon schema:mentions :Bialystok .</p>
        <p>Finally we report that, actually, in the version of arkivo used for the Jozef
Pilsudski archival collections are stored about 270,000 triples, and it is populated
by more than 130,000 individuals. In detail, there are 13,326 individuals related
to items, 15,678 titles, 6,458 authors, 29,280 persons mentioned, 47,185 places,
and 28,039 dates.
4</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusion and Future Work</title>
      <p>In this paper we brie y presented arkivo, an ontology designed to model the
archival description of historical document collections. In the paper we also show
the current usage of arkivo in the context of the historical archive of the Jozef
Pilsudski Institute of America. Currently, we are working on the realization of
an ontology-based digital archive.</p>
      <p>
        Future work will include the implementation of automated and adaptive
ontology population processes exploiting the techniques presented in [
        <xref ref-type="bibr" rid="ref11 ref12">12, 11</xref>
        ], as
well as the investigation of user interfaces aimed at providing the user with a
rich interface to explore interesting relationships that arise from encountering a
single item or le in the archive. It should help the users to nd the unexpected
and hidden knowledge accumulated both in the archive and in the Web.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Giovanni</given-names>
            <surname>Adorni</surname>
          </string-name>
          , Marco Maratea, Laura Pandolfo, and
          <string-name>
            <given-names>Luca</given-names>
            <surname>Pulina</surname>
          </string-name>
          .
          <article-title>An ontology for historical research documents</article-title>
          .
          <source>In International Conference on Web Reasoning and Rule Systems</source>
          , pages
          <fpage>11</fpage>
          {
          <fpage>18</fpage>
          . Springer,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Tim</given-names>
            <surname>Berners-Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>James</given-names>
            <surname>Hendler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Ora</given-names>
            <surname>Lassila</surname>
          </string-name>
          , et al.
          <article-title>The semantic web</article-title>
          .
          <source>Scienti c american</source>
          ,
          <volume>284</volume>
          (
          <issue>5</issue>
          ):
          <volume>28</volume>
          {
          <fpage>37</fpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Christian</given-names>
            <surname>Bizer</surname>
          </string-name>
          , Tom Heath, and
          <string-name>
            <surname>Tim</surname>
          </string-name>
          Berners-Lee.
          <article-title>Linked data { the story so far</article-title>
          .
          <source>Semantic services, interoperability and web applications: emerging concepts</source>
          ,
          <source>pages</source>
          <volume>205</volume>
          {
          <fpage>227</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Martin</given-names>
            <surname>Doerr</surname>
          </string-name>
          .
          <article-title>The cidoc conceptual reference module: an ontological approach to semantic interoperability of metadata</article-title>
          .
          <source>AI magazine</source>
          ,
          <volume>24</volume>
          (
          <issue>3</issue>
          ):
          <fpage>75</fpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5. John H Gennari,
          <article-title>Mark A Musen</article-title>
          , Ray W Fergerson, William E Grosso, Monica Crubezy, Henrik Eriksson, Natalya F Noy, and Samson W Tu.
          <article-title>The evolution of protege: an environment for knowledge-based systems development</article-title>
          .
          <source>International Journal of Human-computer studies</source>
          ,
          <volume>58</volume>
          (
          <issue>1</issue>
          ):
          <volume>89</volume>
          {
          <fpage>123</fpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Bernardo</given-names>
            <surname>Cuenca</surname>
          </string-name>
          <string-name>
            <given-names>Grau</given-names>
            , Ian Horrocks, Boris Motik, Bijan Parsia,
            <surname>Peter PatelSchneider</surname>
          </string-name>
          , and Ulrike Sattler.
          <article-title>Owl 2: The next step for owl</article-title>
          .
          <source>Web Semantics: Science, Services and Agents on the World Wide Web</source>
          ,
          <volume>6</volume>
          (
          <issue>4</issue>
          ):
          <volume>309</volume>
          {
          <fpage>322</fpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Stephan</given-names>
            <surname>Grimm</surname>
          </string-name>
          , Andreas Abecker, Johanna Volker, and Rudi Studer.
          <article-title>Ontologies and the semantic web</article-title>
          .
          <source>In Handbook of Semantic Web Technologies</source>
          , pages
          <volume>507</volume>
          {
          <fpage>579</fpage>
          . Springer,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Rebecca</surname>
            <given-names>S Guenther.</given-names>
          </string-name>
          <article-title>Mods: the metadata object description schema</article-title>
          .
          <source>Portal: libraries and the academy</source>
          ,
          <volume>3</volume>
          (
          <issue>1</issue>
          ):
          <volume>137</volume>
          {
          <fpage>150</fpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Sebastian</given-names>
            <surname>Kruk</surname>
          </string-name>
          , Bernhard Haslhofer,
          <string-name>
            <surname>P Piotr</surname>
          </string-name>
          , Adam Westerski, and
          <string-name>
            <given-names>Tomasz</given-names>
            <surname>Woroniecki</surname>
          </string-name>
          .
          <article-title>The role of ontologies in semantic digital libraries</article-title>
          .
          <source>In European Networked Knowledge Organization Systems (NKOS) Workshop</source>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Sebastian</surname>
          </string-name>
          <article-title>Ryszard Kruk and Bill McDaniel</article-title>
          .
          <source>Semantic digital libraries</source>
          . Springer,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>Laura</given-names>
            <surname>Pandolfo</surname>
          </string-name>
          and
          <string-name>
            <given-names>Luca</given-names>
            <surname>Pulina</surname>
          </string-name>
          .
          <article-title>Adnoto: A self-adaptive system for automatic ontology-based annotation of unstructured documents</article-title>
          . In To appear
          <source>in Proc. of the 30th International Conference on Industrial, Engineering, Other Applications of Applied Intelligent Systems</source>
          . Springer,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Laura</surname>
            <given-names>Pandolfo</given-names>
          </string-name>
          , Luca Pulina, and
          <string-name>
            <given-names>Giovanni</given-names>
            <surname>Adorni</surname>
          </string-name>
          .
          <article-title>A framework for automatic population of ontology-based digital libraries</article-title>
          .
          <source>In AI* IA 2016 Advances in Arti cial Intelligence</source>
          , pages
          <fpage>406</fpage>
          {
          <fpage>417</fpage>
          . Springer,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Peter F Patel-Schneider</surname>
          </string-name>
          .
          <article-title>Analyzing schema.org</article-title>
          . In
          <source>International Semantic Web Conference</source>
          , pages
          <volume>261</volume>
          {
          <fpage>276</fpage>
          . Springer,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14. Daniel V Pitti.
          <article-title>Encoded archival description: An introduction and overview</article-title>
          .
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Ryan</surname>
            <given-names>Shaw</given-names>
          </string-name>
          , Raphael Troncy, and
          <string-name>
            <given-names>Lynda</given-names>
            <surname>Hardman</surname>
          </string-name>
          . Lode:
          <article-title>Linking open descriptions of events</article-title>
          .
          <source>In Asian Semantic Web Conference</source>
          , pages
          <volume>153</volume>
          {
          <fpage>167</fpage>
          . Springer,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <given-names>Mike</given-names>
            <surname>Uschold</surname>
          </string-name>
          and
          <string-name>
            <given-names>Michael</given-names>
            <surname>Gruninger</surname>
          </string-name>
          .
          <article-title>Ontologies: Principles, methods and applications</article-title>
          .
          <source>The knowledge engineering review</source>
          ,
          <volume>11</volume>
          (
          <issue>02</issue>
          ):
          <volume>93</volume>
          {
          <fpage>136</fpage>
          ,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17. Stuart L Weibel and
          <string-name>
            <given-names>Traugott</given-names>
            <surname>Koch</surname>
          </string-name>
          .
          <article-title>The dublin core metadata initiative. D-lib magazine</article-title>
          ,
          <volume>6</volume>
          (
          <issue>12</issue>
          ):
          <volume>1082</volume>
          {
          <fpage>9873</fpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>