<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Data Curation in Cultural Heritage Institutions: Two Case Studies</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Klaus Kempf</string-name>
          <email>klauskempf@gmx.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anna Maria Tammaro</string-name>
          <email>annamaria.tammaro@unipr.it</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stefano Casati</string-name>
          <email>s.casati@museogalileo.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <country>Bayerische Staaatsbibliothek Munich Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Museum Galileo Digital Library Florence</institution>
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Parma</institution>
          ,
          <addr-line>Parma</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The research analyzes the data curation practices carried out by two digital libraries: the digital library of the Bayerische Statsbibliothek and the digital library of the Museo Galileo. Four lines of data curation activities are analyzed: Access, Workflow, Data representation, Reuse. Some considerations on data curation and the problems that digital libraries need to improve are highlighted.</p>
      </abstract>
      <kwd-group>
        <kwd>1 Data curation</kwd>
        <kwd>Bayerische Staatsbibliothek</kwd>
        <kwd>Museum Galileo Digital Library</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>In recent decades, several projects have been carried out to digitize the cultural heritage owned by
archives, libraries and museums (LAM) that highlight the importance of data curation standards and
good practices. In the present recovery phase after the pandemic, where digitization projects are
financed to extend democratic access to cultural heritage, it is very useful to examine the developments
of the data curation practices carried out so far by LAM and to consider the approaches that have been
adopted, what are their theoretical frameworks and where there are gaps.</p>
      <p>Rather than merely presenting the technical challenges, we intend to analyze organizational and
social challenges not as separate considerations, but as integral parts of the structure of the whole. For
this purpose we examine two case studies: the case of Bayerische Staatsbibliothek (BSB) and the case
of Galileo Museum Digital Library (MGDL). The two case studies were chosen because they represent
two pioneering digital library experiences that have adopted different models of digitization: the mass
digitization of the cultural heritage of the BSB and the MGDL digitization of a specialized collection
distributed in different libraries. The aim pursued by the two case studies was similar: to provide general
access to scholars to hardly visible collections or to meet the needs of a specific research program.</p>
      <p>The Bayerische Staatsbibliothek in Munich is the central library of the Free State of Bavaria and one
of the most important universal libraries in Europe. The BSB was a pioneer in mass digitization, starting
a collaboration with Google Book at the end of the 90s and at the same time opening an internal
digitization center called Munich DigitiZation Center (MDZ) to manage the entire workflow from
production to preservation.</p>
      <p>The Digital Library of the Museo Galileo was born in 2004 as a specialized library for the history
of science. The MGDL was one of the first digital library projects carried out in Italy with the
contribution of the Ministry of Cultural Heritage, it integrates various archives of texts, images, 3D
objects from the Galileo Museum and other partner libraries. The Digital Library of the Galileo Museum
has set up an internal center for supporting the phases of creation, management and preservation of
digital content.</p>
      <p>The paper aims to summarize several current and emerging trends in data curation in heritage
institutions, with a strong emphasis on the use of technologies, as tools capable of exhibiting, acquiring
and transforming digital representation on multiple levels, along with organizational and social
implications.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Data curation</title>
      <p>
        Data curation includes "all the processes required for principled and controlled data creation,
maintenance and management, along with the ability to add value to data." (see Wikipedia and [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ])
      </p>
      <p>Data curation is a broad term to indicate processes and activities related to the organization and
integration of data collected from various sources, their enrichment as well as their publication and
presentation so that their value is maintained over time and remains available for reuse and preservation.
In the modern era of big data, the curation of data has become more prominent, particularly for software
processing high volume and complex data systems. In science, data curation may indicate the process
of research data management and extraction of important information from scientific texts, following
FAIR principles.</p>
      <p>
        In cultural heritage institutions, the transition from predominantly analogue to predominantly digital
acquisitions requires significant changes in thinking and practices [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>Organising cultural heritage institutions in the digital environment is addressed in four lines of data
curation activities:
• Access: discovery/data retrieval;
• Workflow: maintenance and improvement of the quality of data;
• Data representation: addition of value to data;
• Re-use: re-use of data including preservation.
2.1.</p>
    </sec>
    <sec id="sec-3">
      <title>Access</title>
      <p>Data curation allows and improves the accessibility and traceability of data. For example, it offers
researchers the possibility of integrated research tools through different and heterogeneous data sets,
using semantic web technologies. It also allows enrichment of the user interface by improving
presentation and display techniques.</p>
      <sec id="sec-3-1">
        <title>Case studies:</title>
        <p>BSB</p>
        <p>The BSB is following the principle that the quality of access and of data retrieval is fundamentally
determined at the stage of producing objects / digital content in the best quality and by adding a
complete set of metadata. At the Bayerische Staatsbibliothek, for example, to enhance the visibility and
accessibility of digital assets, metadata and/or object data itself aren!t only visible and accessible via
the local OPAC, but they are integrated into regional, national and global catalogs and various portals
such as:
• Deutsche Digitale Bibliothek12
• Europeana2
• World Digital Library3
• bavarikon4</p>
        <p>In addition, the BSB participates with the data/metadata of its digitised copyright free holdings on a
nationwide network of so called specialised platforms for the single disciplines of science
("Fachinformationsdienste”). Another possibility of access is the creation and offer of online / virtual
exhibitions.</p>
        <sec id="sec-3-1-1">
          <title>MGDL</title>
          <p>1 https://www.deutsche-digitale-bibliothek.de
2 https://www.europeana.eu
3 https://www.loc.gov/collections/world-digital-library/about-this-collection/
4 https://www.bavarikon.de</p>
          <p>
            The Museum Galileo Digital Library [
            <xref ref-type="bibr" rid="ref1">1</xref>
            ] allows free consultation of the content, except of course
for the publications covered by copyright, which can only be consulted from the locations of the Library
or remotely, by issuing authorization and relative password. The Digital Library also offers a free
lowresolution download service. Some digital collections are also available at portals:
• Europeana5
• Internet Culturale6
          </p>
          <p>The reading system is characterized by a dual navigation mode that allows the reader to "browse"
the text or to use a structured index. For some works, the structured index provides links to related
resources for in-depth study of the topics. The index constitutes an added value to the publication and,
for the cases of stripped works, it is composed automatically. In the interface of the digital library there
are other operating keys that also allow the display and searches of documents in textual form. To
improve the readability of the text, a zoom can be activated; in addition, a gallery of illustrations can
be browsed.
2.2.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Workflow</title>
      <p>The workflow, according to the OAIS model, includes all activities and tools from the initial creation
of the digital object to the final presentation on the portal or on the digital library platform. The
workflow in data curation is iterative and automated: each digital object that is digitized follows an
automated workflow with a reduction in time and costs. The system that controls the workflow, or
Digital Asset Management System, allows to realize the entire production process with a modular
system that can extract data from different service providers. One important aspect of the workflow
concerns the quality of the data and its enrichment, to facilitate the reuse.</p>
      <sec id="sec-4-1">
        <title>Case studies:</title>
        <p>BSB</p>
        <p>The Bayerische Staatsbibliothek has a data curation policy which includes resource management for
optimizing employee employment. Considering the large number of digital objects (over 2 million
volumes) and above all the necessary migration of numerous sub-collections, which until now have
more or less individual software solutions, an improvement in efficiency can only be achieved with
standardization and strictly flow-oriented quality control of possibly all work steps in digital production.
Each original is scanned once with the best possible quality and high quality and high resolution
scanning requires permanent and systematic quality control with the use of the "Metamorfoze"7 tool.</p>
        <sec id="sec-4-1-1">
          <title>MGDL</title>
          <p>The Digital Library Management System is a web application developed in house by the Museum
Galileo Digital Library for using on the Intranet and managing all stages of the workflow process, from
image acquisition to publication in the TECA Digitale. The Digital TECA is an application for the use
of digital resources. The STORAGE component is the most critical system in terms of reliability and
sustainability.</p>
          <p>The workflow adopted for Galileo Digital Library requires the collaboration between computer
scientists, systems engineers and librarians who design the service in a participatory approach with
scholars and researchers. This collaboration has an impact on the acquisition policy, where more
attention is paid to qualitative content than to 'quantity' (i.e for selection of object, specific training
needs).
2.3.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Data representation</title>
      <p>Digitization is about the digital representation of cultural heritage objects. It is necessary to model
the data packages, i.e. not only texts and images, but different types of objects such as 3D models. It is
5 https://www.europeana.eu
6 https://www.internetculturale.it
7 Metamorfoze Preservation Imaging Guidelines are a tool for the production and preservation of images developed by the National Library
of the Netherlands
possible to add value to digital objects, for example through in-depth indexing and incorporation of
semantic structured data as Linked Open Data (LODs), and also by creating new contexts and
developing new original services. Interoperability makes it necessary to harmonize different conceptual
models for particular types of digital objects and at different levels of granularity.</p>
      <sec id="sec-5-1">
        <title>Case studies:</title>
        <p>BSB</p>
        <p>To realize the potential of access and representation of information, Bayerische Staatsbibliothek
adds a set of metadata as complete as possible (technical-administrative metadata,
bibliographicstructural metadata, including a persistent identifier) and uses authority control services
where and when ever available. The quality of the volumes scanned by Google is corrected and
improved and permanent corrections are made on digital images and / or metadata. Electronic
summaries (ToCs) are created and new collection contexts generated (Examples: German Reichstag
Minutes + GND ADB / NDB)</p>
        <p>An essential aspect of data curation is the inclusion of a quality policy and continuous quality control
during the production of digital images: the resolution and sharpness of the image, as well as the color
management, are essential parameters.</p>
        <p>In this context, not only the reproduction technique available internally (scanners and digital
cameras) is constantly being renewed. In addition to this, other quality assurance measures are taken.
One of them is the systematic use of Metamorfoze.</p>
        <sec id="sec-5-1-1">
          <title>MGDL</title>
          <p>The Galileo Digital Library is a new generation thematic digital library, which collects texts, images,
documents, bibliographic references, chronological repertories, lexicons, thematic indexes, catalogs of
objects and experiments, research aids, etc., on every aspect of Galileo's life, cultural activity and
fortune. MGDL consists of two systems: Galileo’s personal library and Galileo//thek@8.</p>
          <p>Other systems are used for integrating collection of digital resources, such as Sinapsi9, for accessing
the Leonardo's Library, the Iconographic Collection Portraits of the members of the Georgofili
Academy and Bibliotheca Perspectivae (in progress).</p>
          <p>
            Galileo Digital Library has also created many digital born resources, increased by the Multimedia
Laboratory and "Wiki projects" [
            <xref ref-type="bibr" rid="ref3">3</xref>
            ]. A part of the Cumulative Database of MGDL, entitled "Galileo
Museum database: tools, books, photographs, documents" has been selected for conversion into LOD
(and therefore into RDF - Resource Description Framework) [
            <xref ref-type="bibr" rid="ref1">1</xref>
            ]. The project MINERV@10 added a
dataset of MGDL to Datahub (Open Knowledge Foundation) and to OpenData (Regione Toscana),
following the principles of Linked Open Data.
2.4.
          </p>
          <p>Re-use</p>
          <p>Data curation is essential for the preservation of digital data. Other features include helping in
detecting errors, aggregating documentation, ensuring data reusability, and in some cases even adding
additional features and files. Reuse is based on the design and evaluation of the interdisciplinary
research approach of the human-computer interfaces.</p>
        </sec>
        <sec id="sec-5-1-2">
          <title>Case studies: BSB</title>
          <p>8 https://galileoteca.museogalileo.it
9 http://www.progettosinapsi.it/soluzioni/
10https://www.museogalileo.it/en/news-archive/121-news-archive-2015/1590-museo-galileo-dataset-in-datahub-okf-and-inopendatatuscany-b-en.html</p>
          <p>The Bayerische Staatsbibliothek makes available new services (like ever gratis) such as "Data for
scientific research" (Daten für die Forschung / DaFo) and provides for the main part of its (historical
copyright free) digitized holdings using IIIF standard11.</p>
        </sec>
        <sec id="sec-5-1-3">
          <title>MGDL The Galileo Digital Library put great attention to interoperability and preservation but this aspect requires further research. Two projects with WIKI and Google are working for the reuse of digital objects [3].</title>
        </sec>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>3. Conclusion</title>
      <p>Digitization is an important step towards digital transformation and has a huge impact on the
organization of cultural heritage institutions and their traditional procedures. In conclusion, we can
highlight the following issues on which more research is needed:
• Data curation is an ongoing - or rather a never-ending process with always new challenges –
due to (changing) technologies, costs and changing - even growing - user needs.
• To realize the full potential of access to the digital library, the challenge is to get to know users
better and to be able to create a participatory approach and transdisciplinary collaboration.
• An essential part of re-use problem solutions is the collaboration not only between data holding
institutions, but also the close interaction with the users.
4. References</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Casati</surname>
          </string-name>
          ,
          <source>La Biblioteca digitale del Museo Galileo</source>
          , Biblioteche oggi, Gennaio-Febbraio (
          <year>2015</year>
          ), pp.
          <fpage>45</fpage>
          -
          <lpage>51</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Casati</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Rolfo</surname>
          </string-name>
          , Online la nuova versione della Galileo//thek@,
          <source>Galilaeana: journal of Galilean studies, A</source>
          .
          <volume>13</volume>
          (
          <year>2016</year>
          ), pp.
          <fpage>181</fpage>
          -
          <lpage>186</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Casati</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Rotoli</surname>
          </string-name>
          ,
          <article-title>La Biblioteca digitale del Museo Galileo e il progetto GLAM, Biblioteche oggi Luglio-agosto (</article-title>
          <year>2017</year>
          ), pp.
          <fpage>33</fpage>
          -
          <lpage>36</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Casati</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pocci</surname>
          </string-name>
          ,
          <article-title>Le collezioni digitali tematiche del Museo Galileo: esperienze e nuove prospettive , in: Storie d'autore, storie di persone : fondi speciali tra conservazione e valorizzazione, a cura di F</article-title>
          .
          <string-name>
            <surname>Ghersetti</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Martorano</surname>
          </string-name>
          , E. Zonca, Roma, AIB (
          <year>2020</year>
          ), pp.
          <fpage>273</fpage>
          -
          <lpage>280</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Casati</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Butini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Viazzi</surname>
          </string-name>
          ,
          <article-title>Redazione e uso di mappe strutturali: un esempio di cooperazione fra biblioteche digitali: la biblioteca digitale del Museo Galileo e la Biblioteca europea di informazione e cultura</article-title>
          ,
          <source>Digitalia</source>
          (
          <year>2018</year>
          ), pp.
          <fpage>51</fpage>
          -
          <lpage>63</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>P.</given-names>
            <surname>Gerth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sieverling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Trognitz</surname>
          </string-name>
          , Data Curation:
          <article-title>How and Why. A Showcase with Re-use Scenarios</article-title>
          .
          <source>In Studies in Digital Heritage</source>
          (
          <year>2017</year>
          ),
          <volume>1</volume>
          (
          <issue>2</issue>
          ),
          <fpage>182</fpage>
          -
          <lpage>193</lpage>
          . https://doi.org/10.14434/sdh.v1i2.
          <fpage>23235</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>K.</given-names>
            <surname>Kempf</surname>
          </string-name>
          ,
          <article-title>Data curation oder (Retro-)Digitalisierung ist mehr als die Produktion digitaler Daten</article-title>
          .
          <source>In: o-bib Das offene Bibliotheksjournal</source>
          (
          <year>2015</year>
          ) Nr.4,
          <issue>Bd</issue>
          . 2/2015
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>K.</given-names>
            <surname>Kempf</surname>
          </string-name>
          ,
          <article-title>Curated content come un aspetto centrale della politica delle raccolte nell'epoca digitale</article-title>
          . In:
          <article-title>La biblioteca che cresce. Contenuti e servizi tra frammentazione e integrazione</article-title>
          .
          <source>Milano</source>
          <volume>14</volume>
          -15 marzo
          <year>2019</year>
          .
          <article-title>Relazioni del Convegno</article-title>
          .
          <source>Milano: Editrice Bibliografica</source>
          (
          <year>2019</year>
          ), pp.
          <fpage>140</fpage>
          -
          <lpage>149</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>R. J.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <source>Big Data Curation in 20th International Conference on Management of Data (COMAD)</source>
          (
          <year>2014</year>
          ), Hyderabad, India,
          <source>December 17-19</source>
          ,
          <year>2014</year>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>