DALIA: An Open Data Repository for the Open Digital Products
of the Bologna Cnr Research Area Library
Gabriela Carrara 1,2, Silvana Mangiaracina 1, Debora Mazza 1, and Alberto Candiani 1
1
    CNR Biblioteca Area territoriale di Ricerca di Bologna, Via Gobetti 101, 40129 Bologna, Italy
2
    CNR IMM-BO, Via Gobetti 101, 40129 Bologna, Italy


                 Abstract
                 The Bologna Research Area Library of the National Research Council is very active in many
                 fields such as educational projects, services and support scientific research.
                 NILDE (Network for Inter-Library Document Exchange), a software developed by the
                 Bologna Library and adopted by a vast community of Italian and foreign libraries is one of the
                 main projects carried out by the Library, started in 2001 and still active. More than 20 years of
                 activity have brought the NILDE project to produce a large amount of digital data (multimedia
                 materials from conferences, video interviews, multimedia teaching resources...) that need to be
                 preserved and managed. Faced with the demand of consultation and reuse of these materials
                 by the NILDE community, consisting of librarians, researchers and students for their daily
                 work, the Bologna Research Area Library devised a simple way to organize the digital data
                 produced by the projects in which it has been involved following the Open Science principles.
                 The open data repository DALIA, accessible via web, was built to the purpose making use of
                 the well proven open source tool CKAN.
                 The archive has been populated starting from the data belonging to the NILDE project. The
                 tests carried out with DALIA suggest its potentiality of a new service to be proposed to the
                 Bologna Research Area researchers in order to introduce them to the procedures for archiving,
                 describing and preserving data as suggested from EU.

                 Keywords 1
                 NILDE, resource sharing, inter-library loan, document delivery, Open Science, open data


1. Introduction2
    The Bologna Research Area Library (known as BdA) of the National Research Council (CNR) is a
multidisciplinary science library which was created in 1995 to collect, integrate, enhance and
disseminate the scientific bibliographic heritage of the institutes that merged into the CNR Research
Area in Bologna1. The BdA plays a role of support and technological development for accessing
scientific documentation and its free circulation and has promoted the emergence of innovative services
for users of academic and research libraries in Italy. The BdA was and currently is involved in several
national and international projects, the main among them being NILDE (Network for Inter-Library
Document Exchange) [1]. NILDE is a web-based software for libraries and end-users, developed from
2001 by the BdA [2,3], which is currently used by more than 900 university, public research and health
libraries in Italy, Spain and Greece, totaling more than 85,000 end-users (mainly researchers and
students) registered through their libraries. A collaborative network of libraries using the NILDE

IRCDL 2022: 18th Italian Research Conference on Digital Libraries, February 24–25, 2022, Padova, Italy
   gcarrara@area.bo.cnr.it (G. Carrara); mangiaracina@area.bo.cnr.it (S. Mangiaracina)
   0000-0002-5172-0820 (G. Carrara); 0000-0003-0717-1227 (S. Mangiaracina)
              ©️ 2022 Copyright for this paper by its authors.
              Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
              CEUR Workshop Proceedings (CEUR-WS.org)
1
  The CNR Bologna Research Area is a research campus that hosts seven CNR research institutes, two INAF (Italian Astrophysics National
Institute) research institutes, one Tecnopolo Laboratory to support innovation of Emilia Romagna companies, as well as common facilities to
support the research activities, such as the central library and the congress center.
software has formed, to mutually and freely share their bibliographic resources: the NILDE community,
consisting of about 2.000 professional librarians who use the system daily, and their end-users. Since
2003, under the aegis of the BdA, the NILDE community periodically meets in national conferences
focused on Document Delivery and interlibrary cooperation and organizes training courses dedicated
to the librarians’ community and focusing on the needs of end-users. After more than 20 years of
activity, the NILDE project and its community have produced a rich variety of publications [4] and a
large amount of digital data of various types: multimedia materials from conferences, video interviews,
multimedia teaching resources, photographs, graphs, etc.

   In 2019, the BdA decided to collect this large amount of material, dispersed throughout Italy, and to
use an open data repository, accessible via the Web, to allow the NILDE community learning about the
historical evolution of the NILDE project and finding useful material for their work.

   We created the DALIA (Dati Aperti della bibLioteca di Area) repository with the twofold aim of
preserving the digital data produced by the several initiatives and projects in which the BdA has been
involved, and distributing them in an open way to the public. To this scope the whole corpus of digital
data produced by the NILDE project represents a perfect starting point.
   This short paper describes the main steps followed in constructing our repository and our future
plans.

2. Repository building phases
DALIA is the result of the following building phases, briefly described hereafter:

   1.   Identifying the repository software/hardware platform
   2.   Repository setup
   3.   Mapping of available/existing data
   4.   Data retrieval
   5.   Data harmonization
   6.   Data input and metadata definition
   7.   Graphic interface

2.1.    Identifying the repository software/hardware platform
    The digital data held by the BdA are of various types and formats (multimedia materials from
conferences, video interviews, multimedia learning resources, photographs, graphs, etc.) and document
the intense activity carried out since 1995, and the NILDE project is no exception. In particular, the
needs of the NILDE community, especially of the various professional working groups dedicated to
training, communication and internationalization, are those of having a common repository of reference
where to find, in a quick, efficient, updated and secure way, the digital materials mentioned above for
the creation of new digital objects or simply to reuse them in new initiatives. Hence the need to store,
catalog and publish these data by grouping them into homogeneous bins using an open-source archiving
system that is simple to manage internally, that could store data in many different formats and that
would allow BdA to distribute such data on the web in an open way under CC-BY-SA license.
    Talking about open data also implies a reflection on how much these data can be made FAIR
(Findable, Accessible, Interoperable and Reusable) as strongly suggested by the European Community.
    Following the EU indications, in order to transform data into FAIR Digital Objects “it is necessary
to assign them a Persistent Identifier (PID) and create metadata rich enough to allow them to be
reliably found, used and cited. The data should also be represented in commonly accepted formats, and
be richly documented using the metadata standards and vocabularies adopted by the relevant research
community. Finally, the data should contain provenance information to enable interoperability and
reuse. The latter includes reporting on how the data were created (e.g., survey protocols, experimental
processes, calibration information, and sensor locations) and information on data reduction or
transformation processes to make the data more usable, understandable, or "science-ready"” [5].
    The EU also suggests that “FAIR is a scale, and various degrees of FAIRness can be applied to
different data sets. It may not make sense, or even be feasible, to apply all FAIR principles to all
outcomes. But a minimum level of FAIRness should be applied to the data being retained (e.g. discovery
metadata, persistent identifiers, and access to the data or metadata)” [5].
    This latter concept applies to the data in our hands and after a careful analysis of the available open
data catalogs (Invenio, Dataverse, CKAN) we avoid solutions that were oversized for our needs.
    Our choice was therefore focused on the CKAN (Comprehensive Knowledge Archive Network)
platform [6] because most of the data in our possession have a PID, the link of a resource (maybe
already present in a recognized repository) can be inserted in CKAN, and CKAN allows to manually
customize the metadata fields and preserve them even when the original dataset has been removed or is
absent.
    In addition, the platform is simple for both users and operators who need to input data and has a
plugin-system that allows adding functionality; altogether it seemed the right choice to build a system
adaptable to the future needs of the BdA. CKAN allows us to i) manage, publish, search datasets and
documents in open format; ii) visualize data in tables, graphs and maps; iii) have the history of
operations performed on datasets by one or more operators; iv) use APIs to manage and query datasets;
and v) integrate with WordPress portals (which the BdA uses). In addition, CKAN's resources are held
in trust by the Open Knowledge Foundation, a non-profit organization with best practice policies on
open government and brand usage. Currently, our CKAN repository is mounted on a docker platform
on our server.

2.2.    Repository setup

    Data are published in units called “datasets” in a CKAN system. Each dataset contains two items:
1) information or “metadata” about the data (i.e. title and publisher, description, date, formats available,
released license, etc.); 2) several “resources”, which hold the data itself without restriction on the
format. CKAN can store the resource internally, or store it simply as a link, the resource itself being
elsewhere on the web. A dataset can contain any number of resources and must belong to an
organization. In addition, it is possible to define some groups to collect datasets even belonging to
different organizations, in themes or projects. On these bases, we organize our data.

   For our purposes, we rename "organization" into “ARCHIVES (ARCHIVI)” and "groups" into
“TOPICS (TEMATICHE)” with the aim to facilitate the user’s navigation. NILDE represents one of
our ARCHIVES.

2.3.    Mapping of available and existing data

   The first bulk of datasets we decide to store inside the repository belongs to the NILDE project. As
shown in Figure 1, the NILDE project is very articulated and has produced materials of various kinds
such as training courses, conference proceedings, manuals and technical documentation, scientific
publications or newspaper articles, graphics, and administrative documents.
   In this first phase, we focused on materials related to the conferences. To date, 12 conferences have
been held all around Italy. To organize the work of data mapping and data retrieval, specific guidelines
were drawn up to determine the conference accompany material: website, program, conference
proceedings, book of abstracts, collection of presentation slides, posters, photos, and videos.
Figure 1: NILDE data mapping schema.


2.4.    Data retrieval
    A great deal of effort was required to recover the original data. As mentioned before, the 12
conferences were organized under the aegis of the BdA, but were hosted by various Italian institutions.
So, the original data were not always present in their entirety within the BdA.
    To retrieve every possible data, we explore each conference chronologically, listing all the materials
mentioned before in a working sheet and checking all possible combinations and websites of hosting.
Most of the difficulties we encountered were related to the early conferences, where there were very
often broken or outdated links. In these particular cases, for example, we contacted directly the
referent(s) of the individual hosting institutions, which are collaborators of the BdA. The aim of this
inquiry was both to have the widest view of all the materials present and to go to the primary source of
the data.
    Then for unstable or “very-likely-to-break” links and files we decided to collect and archive a copy
of these sets of data and store it locally. In one case, we even decide to make a copy of the conference’s
website using the tool Wayback Machine [7] hosted by Internet Archive [8].
  Therefore, this work has allowed us to reconstruct and document the entire evolutionary path of the
NILDE project in order to guarantee its preservation.


2.5.    Data harmonization
    As explained before, data has been retrieved from different sources so we’ve been forced to
standardize them to provide users with a comparable view of data. Following our guideline, we defined
some internal rules, and an example of the relationships between data type and formats we would like
to use is shown in Table 1.

Table 1
Document data type and used data format

                          Document Type                         Data Format
                              Video                                Mp4
                       Presentation Abstract                        pdf
                               Slide                                pdf
                           Photographs                              jpg


2.6.    Data input and metadata definition
   The input work is still in progress but up to now, more than 200 resources relative to 12 datasets
(see congresses) are present inside DALIA. Basic metadata have been already described for each dataset
that is distributed under a Creative Commons License Attribution-ShareAlike 4.0 International (CC
BY-SA 4.0). We are still refining the vocabularies and the ontologies we want to apply to our data to
guide our potential users in the discovery.

2.7.    Graphic interface
    Our goal is to facilitate data discovery by our users (Researchers or librarians involved in the NILDE
project), so we dedicate particular effort to building a graphic interface that could guide our users in
consulting and finding data (see Figure 2: DALIA web-interface (prototype).).
    The based color used for the graphic interface is a websafe red with a nuance pink-red (#993333), a
brighter shade that reclaims the red color on the BdA Library logo. It conveys positivity, energy, and
involvement. As compared to the tranquility emanating from the blue palette (the other color used for
the BdA logo), red exalts the contents of the website and it is easy to remember/memorize by the users,
unlike a blue-colored website. The icons are monochrome white, as colored icons could have been
confused, not standing out through the red background.
    DALIA has a portable interface, that is adaptable to all devices, even small screens such as mobile
phones. Today these kinds of devices are becoming an increasingly agile tool and allow the users to
retrieve data at any time and in any place.
Figure 2: DALIA web-interface (prototype).


3. Future work
    DALIA is currently working but has not yet taken its final form and there is still a lot of data that
needs to be included within it. The next steps will be to achieve the right combination in defining
metadata, keywords and themes that can group the various datasets. The use of specific vocabularies
and appropriate ontologies will also allow for increased interoperability between search systems, thus
promoting the visibility of the NILDE project's open data on the web.
    In the meantime, some Area Library researchers, about to retire, have expressed the need to organize
their data and make them available to the community.
    The tests carried out with DALIA suggest the potentiality of a new service to be proposed to these
researchers in order to "educate" them on the correct procedures for archiving, describing and
preserving data. We are evaluating the scalability of the system and the definition of some basic rules
to allow and facilitate the management of data, for example, within a single research project or broader
themes, or allowing individual researchers to reorganize their data scattered around the world with a
view to future preservation. In this case, the repository that is created can be a useful tool for the
organization of research work: both to allow an almost immediate retrieval of data, and to maintain in
the same place a local copy of data already deposited in other international and certified repositories
such as Zenodo[7] or other listed into ROAR[8], etc.

    The advantages for a single researcher are the extreme simplicity of the storage system and the
possibility to make the data openly accessible at any time. The ability to group different datasets
belonging to different organizations (read researchers) under the umbrella of a project, for example,
will allow data to be shared between researchers of different disciplines relatively easily.
    Or, within the same institution, it might be possible to bring together all the work done by individual
researchers working on the same topic. Last but not least, this system can be used as the basis for writing
a Data Management Plan (DMP) required by the new European research program Horizon Europe. One
of the future activities to support the proposal of this new service will be the organization of training
meetings about Open Data, FAIR principles and the creation of Data Management Plans. An
opportunity not only to provide a service to those who need to deposit or organize their material but
also to introduce these key issues and raise awareness among researchers of the Bologna CNR Research
Area.


4. Acknowledgements
    We thank all the librarians that are working in NILDE projects who in some ways helped us in
retrieving the data. We thank R. Magno for retrieving the original HTML files of the conference website
in which NILDE originated. A special thanks to M. Greco and M. Rossi who gave us a huge
photographic archive documenting NILDE since its beginning. We thank A. Tugnoli and G. Resci that
helped us physically building the infrastructure with their IT competence. We thank A. Frezzini for the
graphic solutions.

5. References
    [1] NILDE (Network for Inter-Library Document Exchange). URL: https://nildeworld.bo.cnr.it/
    [2] Mangiaracina S., Giannuzzi M., Pistoia B., Guazzerotti M. (2005), Il sistema NILDE per il
        Document Delivery: dalla sperimentazione alla cooperazione, dal progetto al servizio, in
        “Biblioteche oggi”, 23 (1), pp. 29-39.
    [3] Mangiaracina S. et al (2008), NILDE: Developing a New Generation Tool for Document
        Delivery in Italy, in Interlending & document supply, 36 (3), pp. 167-177.
    [4] NILDE project publications link. URL: https://nildeworld.bo.cnr.it/it/pub
    [5] Open Science. URL: https://ec.europa.eu/info/research-and-innovation/strategy/strategy-2020-
        2024/our-digital-future/open-science
    [6] Turning FAIR into Reality: Report and Action Plan. https://doi.org/10.2777/1524
    [7] CKAN, The world’s leading open-source data management system. URL: https://ckan.org/
    [8] Wayback Machine. URL: https://web.archive.org/web/
    [9] Internet Archive. URL: https://archive.org/
   [10] Zenodo. URL: https://zenodo.org/
   [11] OpenAIRE. URL: https://www.openaire.eu/
   [12] Registry of Open Access Repositories. URL: http://roar.eprints.org/