=Paper= {{Paper |id=Vol-3160/paper9 |storemode=property |title=A Pilot of Smart Digital Library Used-Centered: The Project SMARTER |pdfUrl=https://ceur-ws.org/Vol-3160/paper9.pdf |volume=Vol-3160 |authors=Nicola Barbuti,Stefano Ferilli,Tommaso Caldarola |dblpUrl=https://dblp.org/rec/conf/ircdl/BarbutiFC22 }} ==A Pilot of Smart Digital Library Used-Centered: The Project SMARTER== https://ceur-ws.org/Vol-3160/paper9.pdf
    A Pilot of Smart Digital Library Used-Centered: The Project
    SMARTER
    Nicola Barbuti 1, Stefano Ferilli 1, Tommaso Caldarola 2
1
    University of Bari Aldo Moro, Piazza Umberto I, n. 1, Bari, Index, Italy
2
    D.A.BI.MUS. Ltd., Piazza Umberto I, n. 1, Bari, Index, Italy


                      Abstract
                      The paper presents the results of the national PoC project SMARTER, aimed at prototyping a
                      smart DL for the management, interaction, and preservation of digitized and born-digital
                      resources, related to ancient printed and manuscript artefacts. The research focused on the
                      development of an innovative metadata schema structured by integrating the languages of the
                      semantic web with conceptual ontologies, and on the experimentation of the applicability of
                      the ICRPad intelligent recognition system (Pat. UIBM n. 0001407881) to large collections of
                      digital objects, with the aim of making them interoperable with each other and, at the same
                      time, usable also through direct interaction with the contents of the metadata. For the
                      interaction with the digital collections, a set of innovative methods and technologies for
                      management and display on the web has been designed, which allow users to interact with the
                      digitized content through advanced tools.

                      Keywords1
                      Smart Digital Library, SMARTER DL, Graph DB, ICRPad

    1. State-of-the-art
        The digitization of cultural heritage (CH) and the creation of digital libraries (DL) are fields of
    renewed and growing interest, especially related to galleries, libraries, archives, and museums (GLAM).
    The recent PNRR also provides for substantial financing on these fields, aiming to regenerate the
    relationships and interactions between the citizens and the CH by pouring online digitized collections.
        The newborn Central Institute for the Digitization of Cultural Heritage – Digital Library has recently
    started to rethink the entire ecosystem of the digitization of heritage, starting from the redefinition of
    the processes of creation and publishing digital objects1. 2
        Although there is a twenty-year tradition of scientific studies on this topic [1] [2], even the recent
    scientific literature is focused on standardized models and processes, especially relating to data
    management and user interaction [3] [4]. The CH digitization methodology and practices used are still
    heterogeneous and dissimilar [5] [6]. The indexing of digital objects with metadata is only focused on
    the description of the original artifacts represented in the layouts of the data [7]2. The users enjoy the
    digital collection by a passive consultation of pre-packaged data, with scarce interaction almost limited
    to the download of the PDF files of the digital objects. 3
        So, today several DLs float on the network which completely improper are considered Digital
    Cultural Heritage (DCH) only because the digital objects reproduce cultural artifacts in their layouts.
    Instead, as poor attention is paid to information relating to the digital objects, which are original
    creations and require specific descriptive criteria, these DLs basically contains digital twins of the

    IRCDL 2022: 18th Italian Research Conference on Digital Libraries, February 24–25, 2022, Padova, Italy
       nicola.barbuti@uniba.it (N. Barbuti)
       0000-0003-0817-4235 (N. Barbuti)
                   ©️ 2022 Copyright for this paper by its authors.
                   Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                   CEUR Workshop Proceedings (CEUR-WS.org)
    1
      https://digitallibrary.cultura.gov.it/notizie/le-nuove-digital-libraries-siano-imperfette-ma-stimolanti-e-coerenti/ (last consultation: 7 October
    2021).
    2
      See the Italian standard MAG and international standards Dublin Core, METS, MODS, xDams.
heritage [7], which cannot be defined as cultural entities. This carelessness heavily affects the quality
of digital objects, whose metadata always lack information on their life cycle, and their preservation
over time, that is still an unsolved issue of the digitization [8] [9]. 4
   In this scenario, some recent DL projects arise as interesting attempts to evolve towards models for
the enhancement of digital cultural objects, as the resources are linked by LODs3 based on the RDF
model4. But once again, the descriptions are focused only on the artifacts represented in the layouts,
with scarce information on the digital objects. 5

2. Motivation
    The project SMARTER DL (SDL) we outline in this paper started in 2019 by these ongoing
scenarios. The goal of the project was to study a pilot of interactive smart DL model for the management
of, and advanced interaction with digital resources related to the GLAM heritage, evolving the
architecture of a DL designed and tested in 2014 [10].
    The research addresses a double perspective of innovation:
     • prototyping an SDL model for managing digital resources related to cultural heritage with an
         innovative metadata schema, studying triples based on conceptual abstractions related among
         themselves, with to goal to overcome the standards, and to generate a model of description of
         digital objects that can be dynamically implemented over time;
     • testing the tools of a patented system for the intelligent recognition of ancient handwritten and
         printed texts represented in the digital objects, ICRPad [11] [12], to extract and automatically
         index the hypertext, improving the users’ interaction with the objects by querying and retrieving
         information from the layout.
    The metadata schema has been designed developing formal ontologies that extend the classes,
relations and attributes typically used in the cultural sphere. The schema expands the context
information of the original artifacts both through conceptual relationships that connect it to other
resources, and by collecting and making available information on provenance representative of the data
life cycle [13]. The goal has been to outline a metadata structure that dynamically add cultural content
to digital artifacts, in order to evolve current databases into knowledge bases, in which the descriptions
of the original artifacts are expanded by the conceptual relationships, and at the same time they continue
to be related to the physical and contextual descriptions of the data.
    The use of Artificial Intelligence (AI) solutions was also a design-driving choice. Several kinds of
AI techniques were involved: semantic technologies to describe and manage the library items, user
modeling and profiling to tailor the behavior of the library to the single users, semantic information
retrieval to overcome limitations of lexical approaches, machine learning and data mining to extract
knowledge from documents and users, to group documents and users, etc.
    The dynamic recording of information in the metadata schema aligns to the perspective of making
the SMARTER DL available in the long term, giving digital resources the function of records consistent
with the FAIR principles [14] [15] which, over time, can assume the value of the new DCH.

3. The SMARTER DL (SDL)
3.1. Metadata Schema
    The innovative metadata management approach includes a technological platform providing both
facilities for data storage and manipulation and an ontological layer opening new possibilities of
automated reasoning for best serving the needs and purposes of its different kinds of users (managers,
librarians, end users, etc.). This platform is named GraphBRAIN [16].
    The technological platform relies on a Graph DataBase. For GraphBRAIN we adopted Neo4j [17].
Compared to traditional Relational DBs, more oriented toward batch processing of structured
information, Graph DBs boost performance of instance-oriented data processing, allowing to navigate
through data items efficiently and effectively by exploiting different kinds of binary associations
3
    See the recent Catalogo Generale dei Beni Culturali of the ICCD https://catalogo.beniculturali.it/ (last consultation: 7 October 2021).
4
    https://it.wikipedia.org/wiki/Resource_Description_Framework (last consultation: 7 October 2021).
between them. Neo4j is based on the so-called Labeled Property Graph (LPG) model, where nodes and
arcs in a graph may be labeled (usually the label represents the type of the instances) and associated to
sets of attribute-value pairs. This is a very powerful data representation model, successfully adopted by
big players in the industry. Differently from traditional DBs, Neo4j does not work based on a scheme
for the data to be stored. Any label and any set of attributes can be associated to each node or arc, and
nodes or arcs of the same type may involve completely different attributes.
    Since schemes are extremely important to determine a meaningful structure in DBs, and to help data
designers and managers properly organize their operations, in our platforms we provided for allowing
schemes in the form of ontologies to be superimposed on the Graph DB, so that only information that
is compliant with the scheme can be added. An additional facility of GraphBRAIN is allowing several
schemes to be superimposed on the same graph, to express different domain-dependent perspectives on
the same data.
    While much research is available on ontologies in Computer Science, especially in the Knowledge
Representation and Reasoning branch of Artificial Intelligence, the current standard formalism for
representing ontologies have some idiosyncrasies with respect to the underlying LPG model. In fact,
the standard formalism consists of triples  (usually expressed in RDF format)
where the components are atomic. This is partly incompatible with the LPG model (e.g., properties
cannot be attached to relations, unless particular workarounds are adopted).
    To bridge the gap between RDF and LPG, we defined a specific formalism for GraphBRAIN
schemes [18], that can fully exploit the power and flexibility of LPGs while still allowing a relevant
and useful mapping to RDF and its ontological declinations such as OWL.
    By applying the ontological schemes to the data in the graph DB, we obtain a so-called Knowledge
Graph, that allows advanced exploitation of the information by expanding the possibilities of standard
graphDB Manipulation Languages (Cypher for Neo4j) with actual high-level logical reasoning that can
infer information not explicitly expressed in the database. In particular, we envision the joint use and
cooperation of various kinds of inference strategies: ontological, deductive, abductive, argumentative,
etc.
    Given these premises, for the SDL we started the design of a GraphBRAIN schema that goes beyond
the standard bibliographic metadata currently in use, expanding them in several directions. The most
important ones are allowing to store the entire lifecycle of the cultural objects in the library, and several
kinds of contextual information. The former is important to fully capture and exploit the peculiarities
of digital cultural heritage, that is to be considered a cultural object by itself, not just a representation
of some physical item. The latter is important to allow an exploitation of the information that is not
limited to single records independent of each other, but provides several kinds of interconnections
between different items, direct (e.g., having the same contributor, or period, or publisher) or indirect
(e.g., citing persons who lived in the same place in the same period). The former would allow to preserve
the history and unity of an object along the years and through different interpretations and exploitations,
which is yet more difficult for digital items that do not have a physical identification. The latter would
support researchers, scholars, managers, or simple users in their activities, suggesting relevant items to
consult or even proposing non-obvious research directions worth investigation.
    Especially relevant to this ‘holistic’ and contextual perspective is the possibility of combining, in
GraphBRAIN, different schemes. So, for instance, using together the GLAM, Tourism and Food
schemes might support the specific consultation and exploitation of the library’s content for touristic
purposes, joining the cultural aspects and the more experiential ones connected to the folklore and
traditions of the place being visited. Also, depending on the different schemes that are combined,
different connections can be found between objects in and around the library collection.
    The defining ontology provides for a set of concepts and relationships that go beyond what has been
proposed so far, both in the description of analog artifacts and in the LOD context. One part is aimed at
describing the life cycle of digital objects, including the activities and actors who are part of it in various
capacities. Other elements expand the range of context information used today. Elements are provided
to describe and manage not only GLAM assets, but also users and their characteristics, to allow AI
technologies to adapt the behavior of the system to the specific needs and purposes of each user.
    The schema is undergoing further development and expansion, and to date contains information
relating to the GLAM, tourism and history of computer science domains [18]. Although still under
construction, the ontology includes 61 classes and 161 reports. The overall consistency is 336483 class
instances, described by a total of 1875571 attribute values, and 496564 relationship instances, described
by a total of 41301 attribute values. Figure 1 shows a section of the graph connecting entity instances
via relationship instances. Different colors of the nodes represent different classes.




Figure 1. Section of the graph

3.2.    The SMARTER DL (SDL) prototyping
   The definition of the new metadata scheme made it possible to focus the design of the SDL pilot in
a user-centered perspective, providing a set of methodologies and technologies for managing and
displaying resources on the web aimed at encouraging advanced user interaction with the collections.
   From a functional point of view, the project focused on two main lines of activity:
    1. analysis, choice, optical acquisition, metadata solutions both of the semantic web and of
        conceptual ontologies, conservation of digital contents relating to documentary cultural
        heritage;
    2. enhancement of the contents available through display on the web with innovative interactive
        consultation and search functions, usable with normal browsers and prepared for use through
        special apps on mobile technologies.
   The DL prototyped in the 2014 project, whose architecture was developed using the open source
DLMS dSpace, represented the basis of piloting, outlined according to the following steps:
    1. analysis and choice of digital objects in relation to different types of ancient textual content
        represented in the layout (manuscripts and printed);
    2. testing the integrability of the functionality of the ICRPad application in the DL and
       interoperability with metadata;
    3. analysis of AI solutions for the conceptual description of digital resources and the interactive
       and dynamic use of contents;
    4. design of the prototype;
    5. analysis and evaluation of the prototyping effectiveness.
   The architecture and functionality have been enriched as shown in the following paragraphs.

3.2.1. The SDL Architecture
    In the design of the SDL the three-level architecture of the previous model have been preserved,
adding some further subsystems.
    The Application level includes the access tools both to the back-end area and to the front-end. The
back-end allows the upload and modification of digital content and associated metadata, the
management of digital services and of users who access them. The front-end allows the visualization,
rendering and use of digital objects and services through a web interface that can combine all the
required digital formats, also offering interaction with multimedia and multi-channel resources thanks
to collaborative tagging. Web-responsive tools can enable adaptive viewing, rendering and interaction
via mobile devices. By monitoring tools, it is possible to manage reports on the interaction with the
system by users, in order to better define their needs and request for the inclusion of new content or the
activation of digital services. Interoperability tools manage the interfaces that allow the exchange of
metadata through the OAIS, OAI-PMH, OAI-ORE and Z39.50 protocols. Furthermore, it has been
further implemented to manage digital objects indexed with Open Data (OD) and Liked Open Data
(LOD), and with the new schema above described.
    The Management Logic Level includes modules for implementing system functionality and basic
tools for configuration and logging. The level can provide the information retrieval by both simple and
advanced query; the management of collaborative tagging to improve user interaction with digital
objects; the access management through authentication and advanced user profiling tools; advanced
user interaction applying ICRPad plugins for intelligent recognition and text extraction from the layout
of objects; the management of indexing and description of digital objects with the schema we are
implementing; the possibility of georeferencing digitized resources through interoperability with the
main online open source platforms (e.g., Open Street Map), and of searching for contents based on
spatial queries.
    The Storage Level manages the organization of digital objects and their metadata, the user
information and associated permissions, and the status of the approval flow when someone insert digital
objects into the DL. The level provides for the management of authentication, access and user profiling
through specific tools. User management tools will verify the authentication and authorize access to the
user profiling him. The authorized user can also propose the insertion of digital objects in the system
through a dedicated function in the user interface. The proposal will be analyzed by content manager
tool, which will validate it and activate the process for publishing it in the collection. Authorized users
can also have access to monitoring data. For the harvesting operation during the fruition of the contents,
interoperability and metadata converter tools can be provided, which can also be used for the physical
transfer of metadata in batch mode and for exposure as OD and LOD.
    All content and user data are managed through the storage API modules.
    In the following paragraph the description of the designed modules is outlined.

3.2.2. Fruition
   Fruition is the module that can provide the Web interface for the distribution and the interactive use
of multimedia digital collections and services.
   In the SDL, this module has been improved for supporting the set of multimedia digital objects, also
providing multi-channel management and data protection. The collections and services can be described
with adequate information consistent with the standard and good practices for managing digital
resources of CH. Moreover, the descriptive tags associated with each one object allow an easy
information retrieval not only by querying author, date, topic referring to original artifacts, but also by
other elements relating to usable formats, dimensions, resolutions and, in perspective, textual content
of the digital objects.
    The module will support the management of the following multiple classes of digital objects, with
related formats and functions:
     1. text files, e.g., TXT, PDF, OOXML, ODF, DJvu formats;
     2. 2D high resolution and uncompressed formats (TIFF, GIF, FITS, PNG, BMP); tools for the
         basic post-processing of images can be integrated, aiming to produce by each object other
         formats with different resolution applying dynamic resizing to the required resolution, and to
         generate JPEG lossy versions for exposure online; a bidirectional conversion of storage formats
         TIFF to FITS and reverse, and export to JPEG can also be provided; the ICRPad tools for
         graphic matching and intelligent recognition will allow the real-time interaction both with the
         layout of the digital object, and with hypertext extracted by the content textual content, which
         can be also indexed with metadata schema we are implementing;
     3. most used audio formats (e.g., WAV, MP3, ABS, MPA); web interfaces for fruition via
         streaming will integrate open-source libraries to optimize the quality of audio reproduction even
         on mobile technology; tools are provided for podcasting audio contents;
     4. several audio-video formats (e.g., MPEG3, MPEG4, H26x, Quick Time, AVI, streaming
         formats); the module is prepared for integrating open-source libraries that will allow video
         playback of supported formats also on mobile technology; on the back end, the integration of
         specific open-source libraries will allow to insert captions, subtitles and alternative content into
         the digital objects; these can be also modified by changing the compression and extracting video
         sequences.
    Tools have been designed for grouping digital objects into albums and alternative contents.
    In future perspective, this module will also manage formats for interactive 3D contents, such as, for
example, virtual routes, or panoramic images, etc., by integrating libraries for the 3D rendering and the
user interaction.
    Another innovation designed is the implementation of specific tools to support the interaction of
Fragile Subjects with the digital collections by mobile devices. Open-source libraries will be integrated
that will allow text-to-speech.

3.2.3. Recognition Subsystem (SSR)
   The Recognition Subsystem (SSR) is a software module of the SDL that allows you to perform
Intelligent Character Recognition (ICR) functions on documents in electronic format for handwritten
and printed text. The main features offered by the SSR module are listed below.

   Languages
   The system allows you to extract text from images or documents in mixed language, however it is
possible to specify the specific language as input. For the printed version the recognition is enabled for
more than one hundred languages while for the handwritten the functionality is provided only for the
Latin languages.

    Natural order of recognition
    For several ancient and modern languages, e.g. Latin, it is possible, optionally, to specify an input
parameter that allows you to perform the recognition respecting the nature of the documents from left
to right ensuring the sorting by columns.

   Range of pages
   If you want to extract a portion of text from a single or a multi-page document, or a collection of
documents, you can specify a range in the layout of one image for obtaining exactly the occurrences by
the overall set You have searched.

   Supported Document Types
   The supported file formats are: JPEG, PNG, TIFF, BMP and PDF. For PDFs and TIFFs, documents
containing at most 1000 pages are allowed and the size of each file must not exceed 50 MB with
minimum and maximum dimensions of 50 x 50 pixels and 10000 x 10000 pixels respectively.

   The main technical requirements of the SSR module are listed below.

   Asynchronism
   The POST operation with respect to the GET operation takes place in asynchronous mode. The
invocation to the input API returns an identifier to be used for the reading API.

    Output format
    The read operation returns information in JSON format decorated with the following information,
in order to be able to apply, on the client side, the functional logic appropriately:
    - lines: list of text lines for each page
    - words: list of words by Line
    - region: a BoundingBox for Line or Word that shows the type coordinates (X, Y) of the box that
       encloses the element within the page
    - text: the text contained in the Line or Word
    - score: the recognition goodness value between 0 and 1 for each element
    - width: the value of the width of the single page
    - height: the height value of the single page
    - page: page number.

   Storage
   The document repository must be reachable via HTTPS by the service performing the
acknowledgment. Even if it is read-only, it is advisable to protect the repository, for example as happens
for the storage hosted by the main Cloud providers.

3.2.4. Customization
    Even this module has been improved for better managing the access of users in the DL, and
monitoring their interaction with innovative advanced services and digital collections. Each user will
be identified by login and password.
    Compared to the previous, this version implements advanced tools to profile user behavior during
interaction with collections and digital resources, through the application of advanced AI techniques.
From the data analysis, specific information about each user can be inferred, such as, for example,
particular interests, interaction preferences, purposes, etc. By the AI tools, users and contents can be
automatically grouped into clusters based on the information inferred, allowing to customize the
services in relation to the different clusters and their levels of interaction, and to associate specific
modules and functions to each one cluster.
    To this goal, the use of groups is envisaged, consisting of explicit lists of users. By the groups, the
different types of access to the system can be profiled, each user would be associated with a profile
consistent with the role defined, information can be extracted, that will be used to aggregate groups into
clusters, also in order to customize the services on different needs. Anyone who belongs to a group will
get the privileges granted to it.
    The DL will try to automatically infer the belonging of a group to a cluster, and to propose specific
methods and tools for each cluster, based on the respective characteristics and needs.
    Clusters can be identified based on a classification, e.g.:
    1. Common user
    2. Researcher
    3. Paleographer
    4. Historian
    5. Archivist
    6. Bibliologist
    7. ...
    A collection, as well as a document, can be associated with multiple groups and multiple clusters
identified as relevant. It will also be possible clustering contents and users to automatically identify new
bottom-up aggregations, based on similarity by description or by content of the instances.
    If users interact with one or more objects of a collection by proposing customized implementations,
a flow-check will be activated, that will allow one or more reviewers to make sure that the
implementations are consistent with the other contents of the collection. The control will track all the
changes that digital objects may undergo from the moment the user proposes the implementation, and
it will prevent multiple users from simultaneously performing operations on the same object that could
change its status.
    The profiling and customizing data will be managed with metadata, with a view to recording and
preserving information on the provenance and life cycle of the resources, favoring their historicization.
Monitoring tools have been designed that will allow to store and analyze information on the interaction,
in order to obtain statistics and reports to be used according to several aggregations, among them:
    • number of users accesses;
    • complete report of all actions performed;
    • number of digital objects used;
    • number of searches carried out;
    • number of OAI requests;
    • more frequent searches;
    • log information;
    • processing information.
    The results of the statistical analysis can be collected in reports and used to identify the user needs.
They may also be made available through the user interface in public mode or reserved for the
administrators of the DL.
    Tagging features have been designed to improve user interaction, so each one can assign one or more
labels to different contents. The tools for inserting and displaying tags will be included in the responsive
web interface.

3.2.5. Data management
    The organization of data in the SDL has been prototyped based on a model that, in addition to
allowing its population, will facilitate the indexing and description of digital objects through the
metadata schema we are implementing. In the perspective to interlink the SDL with other digital
libraries and collections already indexed, the metadata schema encloses elements consistent with the
most important standards used, such as MAG, Dublin Core, METS, MODS, RDF.
    Tools that support the interoperability of the schema with the OAI-PMH, OAI-ORE, OAIS and
Z39.50 protocols have been designed, in order to have massive transfer of data to and from the SDL,
both through back-office and through web query. In particular, the OAI-PMH protocol guarantees
interoperability between the various providers to expose and collect metadata. This implementation will
ensure the exchange of contents with the major national and international digital libraries (e.g.,
Biblioteca Apostolica Vaticana, Catalogo Generale dei Beni Culturali, EUROPEANA, Library of
Congress, World Digital Library).
    The extraction of hypertexts from digital objects by ICRPad tools will also allow for indexing and
description of textual content, greatly expanding the potential for user interaction. Two modules have
been designed to be plugged into the SDL: Document Repository Service module (DSR) and
Recognition Service module (RS). The DSR module will expose the document repository to RS module
through an URI (cloud or on premises). The RS module is composed by a set of API that can be invoked
in order to perform recognition against documents stored into a repository.
    The ICR tools can be plugged/unplugged into the system via simple configuration by using plug-in
architecture. The model consists of two types of components, a core system and plug-in modules.
Application logic is divided between independent plug-in modules and the basic core system, providing
extensibility, flexibility, and isolation of application features and custom processing logic. Plug-ins can
be added or removed from the core anytime. One plug-in’s addition or removal from the core does not
affect the other plug-ins.
   The digital resources will be searchable by the set of information retrieval techniques used for the
targeted retrieval of information. The user who has a specific information need can query the system,
that will perform a search in its internal indexes and will provide in response one or more digital items,
which represent the entities containing the information. If the user will not be satisfied with the results
inferred, because they are too scarce (silence) or huge, or contain incorrect and disturbing elements
(noise), he can perform a new query or refine the previous one by inserting filters. The filter function
will be based on Boolean operators.
   If the digital resources lack the specific keyword entered by the user in the query, but the text has
terms related to it, advanced indexing techniques based on semantics have been provided which greatly
enhance the system's information retrieval functions.
   For improving the interaction of users, the DL will support a multiple layering query interface:
        • simple search: it will allow the search by entering keywords; the combined use of Boolean
            operators will allow to vary the extent of information retrieval in relation to the needs of the
            user;
        • advanced search: it will allow to optimize the search by selecting some characteristics of
            interest; the function will be divided referring to the metadata, but the user can use them in
            a combined way to refine his searches, e.g. as it follows:
                 a. search for descriptive elements (author, title, and other descriptive elements
                      included in the metadata standards that will be defined and used);
                 b. search by classes of digital objects (documents, images, audio, video, photographs,
                      etc.);
                 c. search by metadata elements: if a common high-level schema is used using specific
                      semantic web languages capable of associating a formal meaning to the metadata,
                      the search will also allow the retrieval of information connected with the same
                      descriptive metadata;
                 d. search by content: it can be carried out on a single digital object or on sets of simple
                      or complex digital objects, both of the same class and of different classes; for this
                      function the search engine will use the output of the ICRPad application;
                 e. document layout analysis aimed at the selective extraction of the content: this tool
                      will allow the system to automatically select parts of the layout that identify the
                      relevant information of the digital document, in order to limit the subsequent
                      processing (in whole or in part) to these content, reducing processing times and
                      increasing the quality of the result.
   In order to ensure the easiest use of the SDL collections and services, it will be possible to aggregate
the data in collections consistent with the areas of interest of each institution that uses the model. The
collections can also be implemented on the basis of information detectable by user interaction.
   Some areas identifiable for the designed SDL could include, among the other:
   - Environment and Landscape;
   - Archeology;
   - Art;
   - Creativity;
   - Music;
   - Cartography;
   - Architecture;
   - Literature;
   - Show;
   - Economy and society;
   - Places of culture;
   - Events;
   - Food and wine;
   - Handicraft;
   - History and traditions;
   - Sports.
3.2.6. Georeferencing
   This module will allow to georeference all the contents stored in the SDL. In particular, for each
stored content it will be possible:
   1. to associate one or more geographical positions to the information dealt within the digital object;
   2. to search for groups of objects using queries based on geographical proximity;
   3. to record the georeferencing data in the metadata and store it.
   A graphic component will be implemented, accessible via computer and mobile devices, which will
also allow users to carry out the aforementioned activities. The graphic interface will be created using
web frameworks, in order to facilitate access from a mobile device to the functions implemented. The
generated positions will be displayed using open-source mapping services, such as Open Street Map.
   Compared to the previous version, tools for on-the-fly georeferencing via mobile devices have been
designed, in the perspective to allow users to georeference original artifacts related to digital objects
displayed in the SDL by being in the physical place where they are located, by selecting a specific
function on their device. In addition to greater precision, this function would improve the precise
mapping of cultural objects located in positions not clearly identified through the conventional address
system (eg, a stele in the open countryside).


3.2.7. License management
    Each digital object, collection or service of the SDL will be associated with the type of release
license, depending on whether the user is a person or a legal entity. During the data entry it will be
possible to choose or customize an existing license or add a new license. Since the data entry and the
activation of services pass through an approval flow, the mandatory association of one or more license
agreements/forms to each content will be provided. Some content manager toos will check that the
content sharing is fully or partially consistent with the relevant license agreement/form. The text of each
license will be available on the web and mobile interfaces. During the interaction with any digital object
or the fruition of a service, the licenses can be viewed through a link (or button) clearly identifiable in
the layout of the digital object.


3.2.8. Repository
   The preservation and persistence of data on relational databases will be provided by a storage layer
(SL, e.g., Storage Resource Broker), having care of the integrity of data and metadata for saving and
searching for stored contents.
   The contents will be stored within the SL. The infrastructure of the SL will be used to create a
distributed logical file system, where all the digital contents of the SDL will be stored. This solution
can be useful as it allows to physically distribute and replicate content, and at the same time to keep
backup copies of the contents without the need to purchase any backup service.

4.       References
     [1] A. Salarelli, A. M. Tammaro, La biblioteca digitale, Milano: Editrice Bibliografica, 2006
     [2] A. M. Tammaro, User perceptions of digital libraries: a case study in Italy, Performance
         Measurement and Metrics, Vol. 9, 2 (2008), 130-137. doi: 10.1108/14678040810906835.
     [3] Xie, Iris, Matusiak, Krystyna, Discover Digital Libraries. Theory and Practice, 1st ed., Elsevier,
         2016.
     [4] M. T. Biagetti, Le biblioteche digitali. Tecnologie, funzionalità e modelli di sviluppo, Milano:
         Franco Angeli, 2019.
     [5] J. Bloomberg, Digitization, digitalization, and digital transformation: confuse them at your
         peril,            Forbes,             29th             April            2018.               URL:
     https://www.forbes.com/sites/jasonbloomberg/2018/04/29/digitization-digitalization-
     anddigital-transformation-confuse-them-at-your-peril
[6] N. Barbuti, M. De Bari, La digitalizzazione che non c’è, Biblioteche Oggi Trends, Vol. 7, n. 1
     (2021), 71-80. doi: 10.3302/2421-3810-202101-071-1.
[7] N. Barbuti, La digitalizzazione documentale. Metodi, tecniche, buone prassi, Milano: Editrice
     Bibliografica, 2022 (in press).
[8] L. Duranti, E. Shaffer (eds.), The memory of the world in the digital age: digitization and
     preservation, in: An International Conference on Permanent Access to Digital Documentary
     Heritage, UNESCO Conference Proceedings, Vancouver, 26–28 September 2012. URL:
     http://ciscra.org/docs/UNESCO_MOW2012_Proceedings_FINAL_ENG_Compressed.pdf
[9] L. Bailey, Digital Orphans: The Massive Cultural Black Hole on Our Horizon, Techdirt, 13th
     October 2015. URL: https://www.techdirt.com/articles/20151009/17031332490/digitalor-
     phans-massive-cultural-blackhole-our-horizon.shtml
[10] N. Barbuti, T. Caldarola, D. Re David and S. Ferilli, An Integrated Management System for
     Multimedia Digital Library, Procedia Computer Science, Vol. 38 (2014), 128-132. doi:
     10.1016/j.procs.2014.10.021
[11] N. Barbuti, T. Caldarola and S. Ferilli, A Graphic Matching Process for Searching and
     Retrieving Information in Digital Libraries of Manuscripts, in: G. Serra, C. Tasso (eds), Digital
     Libraries and Multimedia Archives. Proceeding of the 14th IRCDL 2018, Communication
     Computer and Science, Vol. 806 (2018), 139-150 doi: 10.1007/978-3-319-73165-0_14.
[12] N. Barbuti, T. Caldarola, An Innovative Multifunction System for Text Recognition of Digital
     Resources Reproducing Ancient Handwritten and Hand-Printed Artifacts, Proceedings of the
     1st DTUC ’18, ACM (©️2018). doi>10.1145/3240117.3240141.
[13] F. Tomasi, La preservazione del contenuto degli oggetti digitali: formalizzare la provenance
     Bibliothecae.it,                 6             (2017),                17–40.               URL:
     https://cris.unibo.it/retrieve/handle/11585/611249/303579/paper-2017.pdf
[14] N. Barbuti, Creating Digital Cultural Heritage with Open Data: From FAIR to FAIR5
     Principles, in: M. Ceci, S. Ferilli, A. Poggi (eds.), Digital Libraries: The Era of Big Data and
     Data Science. Proceedings of 16th IRCDL, Communications in Computer and Information
     Science 1177 (2020), 1-9.
[15] N. Barbuti, Ripensare i dati come risorse digitali: un processo difficile?, in: Atti del IX
     Convegno Annuale AIUCD. La svolta inevitabile: sfide e prospettive per l'Informatica
     Umanistica. Milano: Università Cattolica del Sacro Cuore (2020), pp. 19-23.
[16] S. Ferilli & D. Redavid. The GraphBRAIN System for Knowledge Graph Management and
     Advanced Fruition, in: Foundations of Intelligent Systems, Lecture Notes in Artificial
     Intelligence 12117, 2020, 308-317.
[17] I. Robinson, J. Webber, and E. Eifrem, Graph Databases, 2nd ed.; O’Reilly Media: Sebastopol,
     CA, USA, 2015.
[18] S. Ferilli. Integration Strategy and Tool between Formal Ontology and Graph Database
     Technology, Electronics, 27 pp., MDPI, 2021.