<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>RESTORE: Smart Access to Digital Heritage and Memory</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Emiliano Degl'Innocenti</string-name>
          <email>Emiliano.deglinnocenti@cnr.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alessia Spadi</string-name>
          <email>spadi@ovi.cnr.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Federica Spinelli</string-name>
          <email>spinelli@ovi.cnr.it</email>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>CNR-OVI</institution>
          ,
          <addr-line>via di Castello 46, 50141 Firenze</addr-line>
          ,
          <country country="IT">Italia</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper presents the first results of the RESTORE (smaRt accESs TO digital heRitage and mEmory) project. RESTORE is a 2 years project - cofinanced by the Regione Toscana - whose main goal is the recovery, integration and accessibility of data and digital objects produced in the last twenty years by the partners involved, in order to build a knowledge base on the history of the city of Prato and its institutions, the development of its economic and entrepreneurial system, the role of women in the development of the city welfare network. The reference context for scientific and infrastructural development activities is the collaboration (through the CNR-OVI institute) with DARIAH-ERIC (ESFRI Landmark for the humanities and social sciences) and E-RIHS (ESFRI project for the heritage science) Research Infrastructures, as well as the European Open Science Cloud (EOSC).</p>
      </abstract>
      <kwd-group>
        <kwd>Data and information lifecycle</kwd>
        <kwd>Annotated textual corpora</kwd>
        <kwd>Collections</kwd>
        <kwd>Data repositories and archives</kwd>
        <kwd>Methodologies and technologies for the production</kwd>
        <kwd>Representation</kwd>
        <kwd>Preservation and enhancement of cultural heritage</kwd>
        <kwd>Models and tools for semantic representation and Knowledge management in the humanities</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>The ecosystem of digital resources concerning cultural heritage is large and very rich,
but it suffers from a very high fragmentation level, so high that it threatens to damage
its value: quality content - produced by libraries, archives and research centers - is often
difficult to find and scarcely accessible. Moreover, it is also characterized by a very
low level of interoperability, since these subjects work in very specific operational and
scientific contexts, or without any relation with other authorities.</p>
      <p>
        During the last decades the trend to generate large amounts of digital information
accelerates from year to year. Martin Hilbert identified the year 2002 [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] as a turning
—————
Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons
License Attribution 4.0 International (CC BY 4.0). This volume is published and copyrighted by
its editors. IRCDL 2021, February 18-19, 2021, Padua, Italy.
point in the accumulation of knowledge on a global level (i.e. “the beginning of the
digital age”1: for the first time in the history of humanity there is more data produced
and stored in the digital domain than in the analogue counterpart). A similar abundance
and heterogeneity of information is also in Memory and Cultural institutions (i.e.:
museums, libraries, archives, etc.). This situation indicates the need for a careful
re-evaluation of the criteria to be used for the selection, structuring, enrichment and publication
of high quality, scientifically validated data, services and tools which otherwise would
remain trapped in the information systems of its creators: often poorly interoperable
and, in some cases, made completely inaccessible by the lack of reference standards. A
vast part of this huge amount of information available online is often fragmented and
characterized by a data-silo logic. Given the high degree of specialization, these
resources are often isolated from other scientific contexts and disconnected from other
similar resources. A truly sustainable and interoperable digital ecosystem for cultural
heritage resources should allow the reconstruction, at least in the digital domain, of a
unified semantic dimension.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Lack of interoperability: the case of RESTORE</title>
      <p>RESTORE2 is a two years project, co-funded by the Regione Toscana, that features
Research Centers (CNR-Opera del Vocabolario Italiano3), Cultural Institutions (State
Archives of Prato4, Municipal Museum of Palazzo Pretorio5, Tuscany Archives and
Libraries Superintendence6) and an SME (SPACE SpA7). The project aims at providing
a multi-level reconstruction of the social reality of the city of Prato, starting from the
Middle Ages and arriving - possibly - up to the XX Century. This will be possible
through the creation of an information system populated with a relevant portion of the
city's documentary heritage preserved in the memory institutions that joined the project.
The RESTORE platform will enhance remote consulting of the partners’ resources
through the use of Linked Open Data (LOD)8 and by bringing to the users advanced
tools to search and browse the project’s knowledge base.</p>
      <p>The State Archives of Prato preserves a vast asset of documentary resources about
social, religious, economic life and history of Prato. These resources will constitute the
first and more relevant building block of the RESTORE knowledge base, that will be
integrated with other relevant assets, starting with the city’s artistic heritage belonging
to the Municipal Museum of Palazzo Pretorio.</p>
      <p>The Datini9 and the Ospedale Misericordia e Dolce10 funds are among the oldest
collections preserved by the State Archives that also produced their digital archival
description, digital images of the documents, transcription of texts and a collection of
lemmatized texts, in collaboration with CNR-OVI.</p>
      <p>The Palazzo Pretorio Museum holds a collection of works of art coming from or
strictly related to the Misericordia and Dolce Hospital and Datini’s house.</p>
      <p>Most of the archival heritage relevant for the project has been managed during the
last 20 years with different digital platforms and tools. Due to the technological
progress the interoperability and accessibility of such resources is now severely limited.
As an example, the archival descriptions, images and transcriptions of the letters
exchanged between Datini and his wife, published on CD-ROM about 20 years ago11 is
no more accessible with modern browsers and operating systems. Furthermore, the
archival description of the Datini and that of the Misericordia e Dolce Hospital funds, are
published on two different websites; the description of the other collections of the State
Archives are published on a third, not interoperable platform. The lemmatization of
Datini’s correspondence is available on (yet) another stand alone website. To complete
this landscape, no connection at all exists with the digital resources related to the
Museum’s items.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Building the RESTORE datastore</title>
      <p>
        In order to improve the situation, fostering resources interoperability, increasing
efficiency, transparency, rapid dissemination of resources and facilitating reuse, it is
crucial to change the paradigm from (legacy) institutional data silos to linked open data.
RESTORE aims at building a knowledge base populated with curated, scientifically
reliable, information - made findable, accessible, interoperable and reusable, according
to the FAIR principles12 [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>Resources can be exposed through a wide variety of channels, which differ according
to the use of platforms and user interfaces with different qualities and characteristics.
The panorama of tools for data management is extremely broad: several publishing
9 The Datini archive includes the administration documents and correspondence of the merchant
Francesco di Marco Datini (1335-1410) that testifies, through his vast activity in the industrial,
commercial and banking fields, a cross-section of the economy and social life of the entire
Mediterranean basin. The digitization project began in 1999 and can be consulted on the
website: http://datini.archiviodistato.prato.it/il-progetto
10 The Misericordia e Dolce Hospital is a charitable organisation that has cared for wayfarers,
poor and abandoned children since the 13th century (cf.
https://sias.archivi.beniculturali.it/cgi-bin/pagina.pl?TipoPag=comparc&amp;Chiave=432072&amp;RicVM=indice&amp;RicProgetto=as%2dprato&amp;RicSez=fondi&amp;RicPag=2&amp;RicTipoScheda=ca). Digital resources related
to this fund can be consulted on the website:
http://www.archiviodistato.prato.it/accedi-e-consulta/aspoSt005/tree
11 Per la tua Margherita: lettere di una donna del '300 al marito mercante: Margherita Datini a</p>
      <p>Francesco di Marco, 1384-1401; published in 2002.
12 cfr. Fig. 1 (workflow diagram).
systems, including traditional institutional repositories (such as archives in Universities
and research institutes), subject repositories (i.e. archives of academic publications in a
particular subject area), or open access archives (designed to host miscellaneous
resources)13 provide the means to upload and publish datasets with a general, disciplinary
or institutional purpose but, in general, the support for the FAIR principles is still
limited and can be improved.</p>
      <p>Based on the specific needs of the project’s stakeholders, considered in different
usage scenarios, we have identified a set of basic functional needs and selected a few
core software components. As an example, in the following paragraph, we briefly
analyze the selection process for the project’s data storage software. We initially
considered CKAN, Dataverse, and Fedora as suitable candidates for the RESTORE data
warehouse establishment: the aforementioned solutions were analyzed focusing on their
ability to manage digital objects with related metadata according to the project needs
and supporting basic FAIR requirements.</p>
      <p>The candidate tools were evaluated considering some general but fundamental
criteria, such as:
• supported file types;
• availability of metadata management tools;
• APIs availability, to foster data accessibility and reusability;
• User Experience, taking in consideration that the target users (GLAMs data
providers) could have limited time, resources and/or know-how to spend on
a new software;
• Community support;
• Available technical documentation;
• Open Source License</p>
      <p>In principle, all the tools analyzed would have allowed us to achieve the basic goals
we have set, allowing data and metadata management and dissemination through well
documented APIs and being supported by a large community of users, as well as
offering a series of extensible features for data processing and visualization, with open
source licenses.</p>
      <p>The reasons for choosing CKAN over Fedora and Dataverse are briefly explained in
the following paragraph and summarized in Table 1, below.
13 Examples of these are Zenodo and Figshare, allowing all kinds of disciplines and different
kinds of research content, such as tables, images, videos and datasets among their documents.
User support
Documentation
Metadata
API
Development
funded by</p>
      <p>Wide supporting community</p>
      <p>Extensive and community maintained comprehensive documentation
Supports unrestricted
metadata</p>
      <p>Description/citation,
domain-specific or custom
fields
Yes</p>
      <p>Follow with
domainlevel metadata
schemas
Government</p>
      <p>Grants; Institutional</p>
      <p>Organizations
CKAN is written in python and offers a simple, nice looking, and very friendly user
interface. It comes packed with many features14, including tools for data harvesting,
visualization and preview, RDF metadata representation for each dataset, as well as
full-text, faceted and fuzzy search capabilities. A distinctive feature for CKAN is the
availability of advanced APIs, allowing a variety of operations on the stored
information, including: search, creation, delete and update of datasets, resources, and other
objects15. CKAN is also widely extensible: custom themes and/or functionalities could
be added through python packages containing one or more plugin classes, some
provided by default, many others available in the CKAN extensions repository16.
Furthermore, CKAN is widely used in the field of open government data17, where it has become
a standard software package.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Creating the RESTORE knowledge base</title>
      <p>
        In the first phase of the project the resources provided by the partners and the problems
related to their integration were analyzed to elaborate an efficient data management
workflow (i.e.: ingestion, mapping, normalization and integration, etc.) and to represent
the partners’ resources into the RESTORE knowledge base. The conceptual model is
defined taking into account the characteristics of each partner’s digital objects, as well
as the main research questions and information production workflows and contexts [
        <xref ref-type="bibr" rid="ref3 ref4">3,
4</xref>
        ].
      </p>
      <p>A preliminary model of the whole RESTORE digital data lifecycle is provided below
(Fig. 1).
14 https://ckan.org/features
15 https://docs.ckan.org/en/2.9/api
16 https://extensions.ckan.org
17 https://ckan.org/case-studies</p>
      <p>Based on the initial analysis of the partners’ resources, an iterative methodology has
been implemented to elaborate the RESTORE conceptual model (which is still under
development). Here follows a brief outline of the most relevant steps in the process of
identification of the classes and properties of the RESTORE ontology, based on the
CIDOC Conceptual Reference Model18.</p>
      <p>During the first phase of modeling, high-level entities and the relationships have
been identified, through the analysis of existing data structures. Subsequently, other
entities and relationships have been identified to support the implementation of test use
cases coming from target users categories (i.e.: research communities, citizens, cultural
and creative industry operators). This process has been performed in close cooperation
with domain experts working in the project’s partners institutions, that provided
additional feedback and validated the model.</p>
      <p>
        For the syntactic and semantic mapping of the most relevant schemas and standards
used in each field (i.e.: TEI19 for texts; EDM20 in addition to MAG21, MODS and
METS22 for contents produced by libraries; EAD23 and EAC24 for archives; CIDOC
CRM for content produced in other areas of the Digital Cultural Heritage, etc.) and to
18 CIDOC Conceptual Reference Model (CRM): http://www.cidoc-crm.org/
19 Text Encoding Initiative: https://tei-c.org/
20 Europeana Data Model: https://pro.europeana.eu/page/edm-documentation
21 Administrative and management metadata: https://www.iccu.sbn.it/export/sites/iccu/docu
menti/manuale.html
22 https://www.loc.gov/standards/mods/presentations/mets-mods-morgan-ala07
23 Encoded Archival Description: https://www.loc.gov/ead/
24 Encoded Archival Context: https://eac.staatsbibliothek-berlin.de
perform the integration of the partners’ datasets [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] we used the Karma Data Integration
tool25.
5
      </p>
    </sec>
    <sec id="sec-5">
      <title>Plans for the future</title>
      <p>Capitalizing on the work being done in the first phase of the project, the main goals for
the second year of RESTORE will be represented by the consolidation of the ICT
infrastructure and the extension of the project’s strategy, in particular by: i) finalizing the
architecture of the ICT components, ii) improving the strategies for the acquisition,
mapping and modelling of the data provided by the partners; iii) developing procedures
and tools for knowledge extraction and representation based on the information stored
in the RESTORE knowledge base. On this latter point we will work in close
collaboration with the SPACE SpA company, that will bring into the project extensive
knowledge on the development of innovative tools and services for the Cultural
Heritage sector (i.e.: virtual museums, smart thematic itineraries, etc.).</p>
      <p>Eventually the RESTORE technological platform will be opened to host collections
provided by other memory institutions seeking to preserve relevant digital assets and
collections related to Prato and its society and extended in terms of functionalities and
types of digital resources managed. The solutions produced by the project will possibly
become a model to be replicated in the context of similar cultural institutions. From this
point of view, the reference points will be constituted by the ESFRI26 infrastructures
active in the field of humanities (DARIAH-ERIC27) and heritage science (E-RIHS28),
whose involvement will guarantee full compliance with the European guidelines on the
subject. of accessibility, interoperability, reuse and creation of added value for society,
starting from the results achieved by the research.
25 https://usc-isi-i2.github.io/karma/
26 https://www.esfri.eu/
27 https://www.dariah.eu/
28 http://www.e-rihs.eu/
and Synergies - PARTHENOS
https://doi.org/10.5281/ZENODO.2662530.</p>
      <p>Project</p>
      <p>Zenodo</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Hilbert</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>How much information is there in the “information society”</article-title>
          <source>? Significance</source>
          <volume>9</volume>
          (
          <issue>4</issue>
          ),
          <fpage>8</fpage>
          -
          <lpage>12</lpage>
          (
          <year>2012</year>
          ). https://doi.org/10.1111/j.17409713.
          <year>2012</year>
          .
          <volume>00584</volume>
          .x.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Wilkinson</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.D.</surname>
          </string-name>
          et al.:
          <article-title>The FAIR Guiding Principles for scientific data management and stewardship</article-title>
          .
          <source>Sci Data</source>
          .
          <volume>3</volume>
          ,
          <issue>160018</issue>
          (
          <year>2016</year>
          ). https://doi.org/10.1038/sdata.
          <year>2016</year>
          .
          <volume>18</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Gangemi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Presutti</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Ontology Design Patterns</article-title>
          . In: Staab,
          <string-name>
            <given-names>S.</given-names>
            and
            <surname>Studer</surname>
          </string-name>
          , R. (eds.) Handbook on Ontologies. pp.
          <fpage>221</fpage>
          -
          <lpage>243</lpage>
          Springer, Berlin, Heidelberg (
          <year>2009</year>
          ). https://doi.org/10.1007/978-3-
          <fpage>540</fpage>
          -92673-3_
          <fpage>10</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Gangemi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Presutti</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Content ontology design patterns as practical building blocks for web ontologies</article-title>
          .
          <source>In: Conceptual Modeling-ER</source>
          , pp.
          <fpage>128</fpage>
          -
          <lpage>141</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Degl'Innocenti</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          et al.
          <source>: Report on the Assessment of Data Policies and Standardization. Pooling Activities</source>
          , Resources and Tools for Heritage E-research Networking, Optimization
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>