Methods and Tools for Building Open Systems of Scientific
                         Research Support
                         Oleksandr Novytskyi1, Oleg Spirin2

                         1
                           Institute of Software Systems of the National Academy of Sciences of Ukraine, Akademika Glushkova Avenue, 40, Kyiv,
                         03187, Ukraine
                         2
                            Institute for Digitalisation of Education of NAES of Ukraine, M.Berlyns'koho St., 9, Kyiv, 04060 , Ukraine


                                          Abstract
                                          The article is devoted to the problems of building an integrated environment within the project
                                          implementation of the development of open science in Ukraine. To ensure the aggregation of metadata, it
                                          was necessary to conduct a study to identify modern methods of metadata harvesting. Since the solution to
                                          this problem involves using ready-made solutions, there was a review of the tools that allow the
                                          achievement of the project's goals. Therefore, a detailed comparative review was carried out, as well as
                                          methods for improving the semantic integration of information for the protocol will be proposed for OAI-
                                          PMH.

                                          Keywords 1
                                          OAI-PMH, Vufind, Data integration, Metadata harvesting, ETL


                         1. Introduction
                         When projecting an effective system for sharing scientific results, it is essential to consider the
                         specificity of scientific data. Each field of study has its unique characteristics: in some cases,
                         researchers handle large volumes of experimental data in the form of images, while in others, they
                         work with complex data structures such as chemical formulas or information about astronomical
                         objects. Additionally, each data repository employs its own metadata standard. Thus, the issue of
                         integrating heterogeneous resources and providing unified access to them arises.

                         In Ukraine, various initiatives have been undertaken to establish an open science system aimed at
                         enhancing the visibility of research outcomes of NASU scientists in the open science information
                         environment using modern technical and informational tools. The effectiveness of these initiatives
                         is to be evaluated using scientometric indicators like citation index, Hirsch index, i-index, g-index,
                         which will promote the development of science in Ukraine, international scientific collaboration,
                         and broaden access to the scientific community, organizations, and enterprises both in Ukraine and
                         internationally to the research results of NASU. As part of this project, a series of programmatic and
                         technical measures are planned to integrate into the scientific and educational space [1].
                         For the publication and preservation of scientific results, digital libraries (DL) and journals are used,
                         providing access to a vast array of resources in the form of digital objects and a wide variety of
                         tools for searching, viewing, and utilizing digital content. Complex and flexible metadata schemas,
                         such as Dublin Core, MODS, and METS, have been developed and are used to describe digital


                         14th International Scientific and Practical Conference from Programming UkrPROG’2024, May 14-15, 2024, Kyiv, Ukraine
                         * Corresponding author.
                         † These authors contributed equally.
                           o.novytskyi@iss.nas.gov.ua (O. Novytskyi), oleg.spirin@gmail.com (O. Spirin)
                           0000-0002-9955-7882 (O. Novytskyi), 0000-0002-9594-6602 (O. Spirin)
                                   © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
objects in collections [2]. The semantic layer allows for more effective extraction of necessary
information than the metadata level.

2. Methodology
   The study presents a review of current approaches to data integration and identifies the methods
most suitable for the integration of scientific information. An analysis of software designed for the
creation of integrated environments has been conducted. Publications in Google Scholar on the topic
of scientific data integration from 2020 to 2024 were analyzed. Software with new releases in 2024
was identified. Each of these software packages was installed to evaluate its capabilities for
addressing the problem of scientific data dissemination and integration. The results were tested in the
open science project implemented by the National Academy of Sciences of Ukraine.

3. Approaches to the integration of structured data

In information integration, several challenges can be identified, such as schema integration, data
warehouse integration, data integration (also known as enterprise information integration, EII),
catalog integration, and the construction and storage of big data. The most complex step is
identifying correspondences between semantically related entities in local and global ontologies.
Interoperability is the ability of separate systems to exchange information and use it. The term
interoperability is widely applicable, particularly concerning the effective coexistence of
information resources. This issue can be defined in various aspects, including the semantic aspect.
Semantic interoperability is the ability of information systems to exchange and interpret the
content automatically. Achieving semantic interoperability involves resolving the heterogeneity of
information being exchanged. Semantic heterogeneity is more complex than syntactic and
structural heterogeneity as it deals with varying contexts [3], [4], [5], [6], [7]. Syntactic
heterogeneity results from requirements for metadata formats. Standardizing data formats is a
common approach to solving syntactic heterogeneity issues. For example, XML is used as a
standard format for all web-accessible data.
In open science, the volume of such data approaches that defined as Big Data. The primary feature
of these data is their exponential growth. Many efforts are directed towards solving Big Data issues,
requiring the development of new methods and algorithms for BD processing to address integration
challenges. The definition of Big Data primarily relates to the difficulty of quantitatively defining a
set of information objects. The most accepted definition is found in a report [8] wher e Big Data
management issues are based on the three Vs: Volume, Velocity, and Variety. These represent the
growth in data volume, the heterogeneity of data formats and metadata, complicating rapid data
management. Later [9], veracity was added as a criterion to the Big Data definition, refining and
supplementing criteria that affect data complexity and unstructuredness [10], [11]. However,
semantics and structure are provided through external ontologies and fixed through metadata for
semantic Big Data. This does not solve the operational issues of such data and creates additional
problems related to extracting information from such BD sets. Our semantic data model should
meet the requirements of Findable, Accessible, Interoperable, and Reusable data, known as the FAIR
principle [12].
The semantic issues for Big Data based on the understanding that data semantics means the
meaningful and efficient use of data objects to represent concepts or objects in the real world [13].
This broad concept encompasses various application areas [14]. Semantic knowledge of Big Data
(BD) pertains to numerous aspects of rules, expert knowledge, and domain information [15]. A
specific property of Big Data in the semantic environment is the complexity of inference, even
when the data were not large initially. Thus, a characteristic feature of large semantic data is
inference complexity, related to the speed of such operations. Internet web applications are highly
sensitive to response delays, and web technologies require high-speed performance for Big Data.
Volumes of metadata that are aggregated in an open science project have the potential to approach
big data, and therefore the ability of the system to operate with such data should be taken into
account during design [16].
Metadata for repositories are crucial for organizing and providing access to digital resources. It
involves creating, managing, and applying descriptive information about digital objects such as
books, articles, images, audio files, and other digital content. The issue of metadata representation
and exchange between libraries remains relevant, although significant progress has not been
achieved over time. A distinctive class of integration systems is those based on the Open Archives
Initiative (OAI) technology. In most known systems of this category, their information resources are
collections of textual documents, primarily scientific publications, autonomously formed at global
network nodes, maintained, and administered by their owners. Metadata aggregation for
repositories is performed according to the Open Archives Initiative Protocol for Metadata
Harvesting (OAI-PMH), providing global access and search services [17], [18]. The essence of the
open archives approach is to allow web access to information resources located in interoperable
repositories by organizing shared use, publication, and archiving of metadata for such resources.
The OAI-PMH protocol provides data providers with a simple way to present their metadata,
making it accessible to service providers. HTTP is used as the transport protocol and XML as the
data exchange format to address syntactic heterogeneity issues. However, in the OAI-PMH
protocol, the Dublin Core format is specified to ensure a basic level of interoperability. Thus,
metadata from various heterogeneous sources are combined in a single database to provide a range
of services based on such aggregated metadata.
The OAI-PMH protocol concept distinguishes two parts: data provider and service provider. A data
provider is a service that supports the creation and maintenance of one or more repositories
(document bases, archives, electronic libraries), publishes its resources, and provides access to its
metadata for use in other systems. A service provider collects and stores metadata provided by data
providers to offer various services to end users. For a long time, a service provider for electronic
libraries in Ukraine operated based on the PKP Harvester. The PKP Open Archives Harvester (PKP
OAI Harvester) is a good tool for collecting metadata from various archives via the OAI-PMH
protocol. This system allows metadata collection from DL in Ukraine, indexing 76 repositories with
over 630 thousand records. The page listing electronic libraries is shown in Figure 1.


Figure 1: Page listing electronic libraries
PKP Harvester uses PHP 5.6, whose support has ended. This creates a problem for developing and
creating new services. In developing software tools to support scientific research, it is necessary to
provide not only metadata search but also extended capabilities for their processing and integration
with other systems. At the same time, the goal is to create a system oriented towards working with
Ukrainian data providers. This prompts the investigation of modern methods and solutions for
creating environments that integrate DL. While many projects use the OAI-PMH protocol for data
integration, such as BASE, OAIster, and CORE, Ukrainian electronic libraries are not fully
represented in these aggregators. This is partly because the metadata language is predominantly
Ukrainian. Not all electronic libraries properly provide multilingual metadata. For example, one of
Ukraine's largest electronic libraries, the Scientific Electronic Library of Periodicals of the NAS of
Ukraine, duplicates data such as resource descriptions without indicating the language,
necessitating data processing, as shown in Figure 2. Such problems are common for multilingual
repositories, which are generally common for multilingual countries.


Figure 2: Duplication of descriptive metadata without language identifier

The article discusses typical approaches to the integration of an electronic library. To achieve
integration, it is necessary to determine which metadata will be integrated and which protocols
should be used. Similar results were published in articles [19], [20], [21], [22] describing typical
components of DL and the main issues related to standard approaches to architecture. In our work,
we focused on the development of new methodologies for data integration.
The purpose of this research is to study methods and tools for developing an open science harvester
for the NAS of Ukraine as a software tool for automatically collecting metadata of scientific
periodicals of the NAS of Ukraine and information resources of NAS institutions. Integration can be
based on metadata or a formalized content model. Typically, main protocols for scientific systems
are based on metadata exchange, but these metadata can vary. Here are some main types of
metadata used in DL:

   •   Descriptive metadata: provide basic information about a resource, such as its title, author,
       subject, keywords, abstract, and publication date. These metadata play a significant role in
       enhanced data exchange architecture.
   •   Structural metadata: describe the structure of the resource and the relationships between
       various components and set the relationship in the case when the resource consists of a set
       of other resources, for example, in the linked data model.
   •   Administrative metadata: include information about the management and administration of
       digital resources. It contains details about rights, permissions, access restrictions, file
       formats, file sizes, technical specifications, and preservation information.
   •   Archival metadata: are crucial for the preservation and archiving of digital resources. They
       include information such as the resource's origin, file format, checksums to ensure integrity,
       migration history, and other technical metadata. These metadata are necessary to ensure
       that digital objects remain authentic and accessible over time.
   •   Rights metadata: define intellectual property rights and usage permissions related to digital
       resources. They contain information about copyrights, licensing terms, usage restrictions,
       and citation requirements.
   •   Technical metadata: Provide information about the technical characteristics of digital
       resources. They include details about file formats, resolution, compression methods, color
       spaces, and other technical specifications necessary for rendering, reproducing, or
       processing digital content.
   •   Usage metadata: Track user interactions with digital items. They include information such
       as the number of downloads, views, ratings, comments, and user-generated content related
       to a specific resource.

    All these types of metadata can be involved in DL integration. Various protocols and
approaches are used to provide access to metadata through a single access point. Metadata work
together to provide a comprehensive description of digital resources in a digital library, ensuring
effective resource search, discovery, access, and management.

4. Data exchange protocols for digital libraries
   Data exchange in a DL involves the transmission of information or resources between various
systems, platforms, or repositories within the library ecosystem [23]. This process includes the
sharing, importing, exporting, or synchronization of data to ensure that the DL collection remains
current, accessible, and consistent across different platforms. In scientific data repositories, the
integration process can be ensured at various levels:

   •   Metadata Exchange: Methods enable the exchange of metadata between DL, allowing them
       to discover, access, and retrieve resources from diverse sources. Standardization of metadata
       schemas such as Dublin Core, MARC (Machine-Readable Cataloging), or MODS (Metadata
       Object Description Schema) promotes interoperability and data exchange. This is facilitated
       by harvesting metadata and importing it into a data aggregator. An example is the OAI-
       PMH (Open Archives Initiative Protocol for Metadata Harvesting), which facilitates the
       collection and exchange of metadata, ensuring efficient data synchronization. Another
       example is the Journal Article Tag Suite (JATS) format [24], a standardized markup format
       for scientific articles in web publications. JATS is used for structuring and presenting
       metadata, text, references, and other elements related to scientific articles. It provides a
       unified format that facilitates the exchange, integration, and analysis of scientific data across
       different platforms and systems. JATS is based on the XML standard, which allows for easy
       processing and adaptation of data for various needs in processing and visualization.
   •   Distributed Search: Enables users to simultaneously search multiple DL or repositories and
       retrieve relevant results from each source. Data schema mapping is critical for distributed
       search when a request to each remote repository must be relayed to a local data schema.
           Typical representatives of this protocol are Z39.50 or SRW (Search/Retrieve Web Service)
           [25].
   •       Data storage: research data and scientific results require long-term storage, which is ensured
           by the exchange between aggregation systems and data transfer to centralized data storage
           systems in various formats. A set of standards ensures the implementation of long-term
           storage, such as PDF/A.

    Let's consider in more detail the OAI-PMH protocol, which is the most widely used protocol for
metadata exchange in open access DL. The most common metadata schema supported in OAI-PMH
is Dublin Core, which provides a basic set of elements for resource description [26]. The OAI-PMH
protocol supports any metadata scheme and is effectively a transport protocol, but for basic
compatibility, the Dublin Core (DC) metadata set is included. Typical scientific repositories that use
software Dspace, Eprints, or OJS support MODS or METS also MARC is widely known for library
catalogs.
    Adjacent to the OAI-PMH protocol is the open archives initiative object reuse and exchange
(OAI-ORE) protocol. Both protocols, developed by the OAI, aim to enhance interoperability and
exchange of digital resources, albeit through different implementations. The OAI-PMH facilitates
interoperability, which is not specific to any application, with a foundation for metadata harvesting.
Its data model consists of three primary components:

   •       the resource, defined by one or multiple metadata records;
   •       each element of the metadata record is encoded in the form of an XML document, which is
           controlled by the XML schema. In this way, syntactic uniformity is achieved.;
   •       the item, a container for metadata records referring to a single resource, which must include
           at least one Dublin Core metadata record.

In contrast, OAI-ORE focuses on establishing a data model rather than defining an exchange
protocol, suggesting potential exchange formats, such as XML/Resource Description Framework
(RDF). The OAI-ORE model differentiates three types of resources:

       •    the aggregation, designed to group other resources referred to as aggregated resources;
       •    the aggregated resource, representing an information object within a compound object
           according to ORE standards; and
       •    the resource map, a serialized depiction of an aggregation that enumerates the aggregated
           resources and properties regarding the aggregation and its aggregated resources, including
           relationships with external resources [27], [28].

Since these protocols are very similar and developed under the same initiative, it is appropriate to
provide examples of the differences between them, as shown in Table 1.

Table 1
Fundamental differences between protocols OAI-PMH and OAI-ORE
                 OAI-PMH                                           OAI-ORE
Intention
OAI-PMH is designed to facilitate metadata OAI-ORE focuses on the aggregation and
harvesting and retrieval. It establishes a exchange of complex digital objects. It aims to
standardized protocol for repositories to address the challenges associated with
provide access to metadata about their digital representing and disseminating digital objects
resources, thereby enabling other systems composed            of     multiple  interconnected
(harvesters) to retrieve and aggregate this components, such as web pages, multimedia
metadata.                                      files, annotations, and others.
Structure of data
OAI-PMH primarily deals with metadata. It         OAI-ORE is concerned with the structure and
defines a protocol for exchanging metadata        representation of complex digital objects. It
records, typically in standardized schemas such   provides a framework for describing the
as DC or MODS, between repositories and           relationships and aggregations of individual
harvesters.                                       resources that compose an object, along with
                                                  associated metadata.
Interoperability
OAI-PMH promotes interoperability by              OAI-ORE      enhances      interoperability  by
standardizing the exchange of metadata. It        addressing the challenge of exchanging and
allows repositories to expose metadata in a       reusing complex digital objects. It enables the
consistent format, facilitating the aggregation   representation of aggregations, annotations,
and harvesting of metadata by external            and relationships among resources, rendering
systems.                                          it easier to share and reuse complex digital
                                                  objects across different systems and platforms.
Scope of use
OAI-PMH is commonly used in DL,                   OAI-ORE is advantageous for managing
repositories, and archives for exposing and       complex digital objects that span multiple files,
disseminating metadata about their collections.   formats, or representations. It is useful in
It enables metadata aggregation, search, and      scenarios such as scholarly publishing, digital
discovery across distributed repositories.        exhibitions,     collaborative  research,     or
                                                  multimedia collections, where understanding
                                                  the relationships and structure of digital
                                                  objects is essential.

   Despite the simplicity of the protocol, this feature presents challenges for the semantic
integration of data. This issue is exemplified by the OAI–PMH data model. The structure of the
OAI-PMH XML documents adheres to a specific schema defined by the protocol.
Validating the XML of a document that describes metadata with the help of XLST is difficult
because this mechanism does not provide an opportunity to semantically check the structure of the
document. As shown at the top of the article, it is necessary to transfer metadata to a new level
using semantic technologies such as RDF. However, the current OAI-PMH realization does not
provide it. While facilitating the exchange of metadata, the OAI-PMH does not impose constraints
on the semantics of the data being exchanged. This allows for the encapsulation of any type of data
within a semantically independent structure. The protocol does not mandate the use of a specific
vocabulary for DC elements, thereby allowing resources to be described with varying keywords.
Furthermore, the XML schema does not enforce restrictions against duplicating elements or
mandate the definition of language attributes, leading to potential issues when data providers
submit duplicated data or data lacking language specifications. Semantic data integration aims to
harmonize the meanings and interpretations of data elements across different repositories. This
process extends beyond the mere exchange of metadata, striving to forge a unified data viewpoint
by rectifying semantic discrepancies and establishing connections between disparate datasets. In the
context of OAI-ORE, the OAI-PMH protocol is characterized as a transport mechanism that
facilitates data delivery.

5. Data processing improvements for the OAI-PMH protocol
   In view of the above-described OAI-PMH architecture concept, this paper proposes a semantic
extension for data integration in DL [29]. As mentioned, OAI-PMH is a transport layer protocol that
can be used only for data exchange, the solution to semantic integration should be implemented by
improving the data received and expanding the connections between data. To achieve the schematic
integration of metadata from heterogeneous sources, it is suggested to use ontologies of linked data
and data mapping. Metadata alignment: usually works at the level of controlled dictionaries, in
some cases metadata must be separated or combined, which requires transformation to achieve
semantic integrity. Ontology mapping involves the creation of links between two or more source
ontologies, and since the application domain is usually common to the metadata model, such
mapping is quite simple. 3. Linked Data. Using linked data principles to connect and link data sets
between repositories. Linked data allows you to establish explicit relationships between resources
using standard protocols such as RDF.
Software processes that facilitate metadata integration are commonly known as extract-transform-
load (ETL) processes. ETL (Extract, Transform, Load) are a process and tools that provide extraction
of data from several sources, their transformation and normalization with customization, and
insertion into a data store. The ETL process model looks like this:
Extraction (E): can be defined as a comprehensive search of all the data contained in the source
system, leaving no data unaccounted for or missed during the extraction process.
   Transformation (T) means changing the data structure to a new one: data cleaning, which includes
removing or correcting errors, inconsistencies, duplicates, and missing values in the data; checking
data for integrity, quality and data consistency with predefined rules or business logic; data
enrichment, which includes enhancing data by adding additional information, derived attributes, or
calculated values based on rules or external data sources; data filtering: selecting or excluding certain
data based on predefined criteria.
   Loading (L): This process involves loading the transformed data directly into the target system or
database using native loading mechanisms or APIs.
   It is appropriate to apply these processes when solving the problem of semantic integration within
the framework of the OAI-PMH protocol.

6. Overview of software for data integration according to the OAI-PMH
   protocol
    Several software solutions are available for integrating OAI-PMH, allowing organizations to
harvest and display metadata from various repositories. We conducted a brief analysis of popular
software for creating an OAI-PMH harvester and identified key features for creating an effective
scholarly environment. The following requirements were set for forming the list: open-source
license, system support and updates, historical system stability, and high-quality system
architecture utilizing modern design methods. Below is a comparative table of characteristics of
popular software for integrating OAI-PMH DL.

Table 2
List of modern software for integration of scientific repositories
Software             VuFind                         DSpace                      Omeka S
   Description       VuFind is a software that      DSpace is a digital         Omeka S is software
                     supports the collection and    storage platform that       for creating digital
                     integration of repositories    supports the OAI-PMH        collections. Includes
                     through OAI-PMH, which         protocol.                   OAI-PMH support,
                     allows collecting metadata                                 allowing metadata to
                     from multiple sources and                                  be exchanged with
                     providing a unified search.                                other     OAI-PMH-
                                                                                compliant systems.
Stack                PHP, MySQL, SORL               JAVA, MySQL                 PHP, MySQL
ETL                  Yes                            No                          No
Support of library   Yes                            No                          No
catalog systems
Faceted              Yes                            Yes                         Yes
navigation
Filtering of         Yes                            Yes                         Yes
received
records/filtering
of searches
Recommendation        Yes                         No                          Yes
system in the user
interface
Connection of the     Yes                         No                          No
full-text extractor
Full text search      Yes                         Yes                         Yes
Fuzzy search          Yes (Sorl)                  Yes (Sorl)                  Yes (Elasticsearch)
User roles            Yes                         Yes                         Yes
LDAP                  Yes                         Yes                         Yes
authentication
Creating a cover      Yes                         No                          No
page
DOI                   Yes                         Yes                         Yes
EZproxy               Yes                         Yes                         Yes
Spelling to search    Yes                         Yes                         No
Export to OAI-        Yes                         Yes                         Yes
PMH
Configuration-        Yes                         Restricted                  No
based interface
Multilingual          Restricted                  Yes                         No
support for
metadata
Matomo analytics      Yes                         Yes                         Yes
API                   REST API                    REST API                    REST API
Metadata              Dublin Core, METS,          Dublin Core, Dublin         Dublin Core, METS
Schemas (Import       Dublin Core Terms,          Core Terms
and View)             MARC, XML, CSV
Editing metadata      No                          Yes                         Yes
Web interface for     No                          Yes                         Yes
resource
management
Autocompletion        Yes                         No                          No
when searching

    When evaluating characteristics, it is important to understand that objectively comparing a
number of parameters is very challenging. For example, in VuFind [30], [31] the system architecture
is designed in such a way that metadata display management is completely controlled through
theme modifications. In other systems, managing the presentation requires simple changes in the
system settings through a graphical interface, as implemented in DSpace. Adding new metadata
fields in Omeka S requires creating a plugin. Therefore, while the complexity of manipulating the
data model in DSpace is the simplest, considering flexibility and speed, VuFind has the advantage.
Overall, after analyzing the capabilities of each system, VuFind is a more successful solution for
achieving the research objectives.
    VuFind is software for creating a library resource portal, primarily aimed at improving user
interaction by transforming the traditional online public access catalog (OPAC) [32]. This platform
is an open-source library search engine developed by Villanova University's library, first released to
the public in a stable version in 2010. The software architecture of this product is very well
implemented, achieved through a developer-oriented toolkit, the Laminas framework, and a large
number of system settings. This allows for changing the structure of metadata to be displayed to the
user without needing to modify the system's code. The formatting rules of an object are controlled
from the theme's code, establishing rules for which metadata to use and which system methods will
handle data retrieval. This is sent to the backend, and after processing, the result is returned to the
interface for user display. Thus, in the system's architecture, data and the formatting rules for this
data are separated, which is very convenient for customization.
    Let’s examine in more detail how the ETL method is implemented in VuFind. In this software
product, the data transformation process is actually divided into two stages. Upon receiving data,
the initial transformation of metadata occurs to change the record identifier from the archive. This
is due to the need for unique identifiers and, on the other hand, the identifier structure should not
contain slashes. Since each resource has a URL corresponding to its identifier in the primary
electronic library, which is the source of metadata. Based on these advantages, this software
product was chosen as the foundation for creating an integrated environment to support scientific
research.
    The result of deploying and indexing the scientific electronic library of periodicals of the NAS of
Ukraine is presented in Figure 3.


Figure 3: Resource list interface with a faceted filter

    Semantic data integration is not provided in VuFind by default, but it can be achieved through
user functions that can map semantic data. One of the advantages of VuFind is the ability to use
such calls. The process of integration and organization of access to information in VuFind consists
of the following stages:1) Metadata collection using the OAI-PMH protocol; 2) Data transformation
according to the ETL model. At the extraction stage, VuFind allows for partial transformation
operations. This process enables VuFind to create a unified and comprehensive index of resources
from many sources, providing users with centralized search capabilities; 3) User search on
aggregated data using a convenient interface with deep configurability; 4) Resource access. Each
resource is directly accessible through provided links, including necessary identifiers (e.g., URLs or
DOIs) to the full content hosted by the original data providers; 5) Metadata display. The collected
metadata is presented in a standardized ETL process and a user-friendly format. This can enrich the
metadata with additional information or aspects to improve search and assist in resource discovery.

7. Conclusions
   To enhance the visibility of research results from NASU (National Academy of Sciences of
Ukraine) scientists in the open science information environment, modern software tools are
required. This will ultimately enable the evaluation of effectiveness through scientometric
indicators, promoting the development of science in Ukraine and international scientific
cooperation.
   Building an integrated environment for the aggregation of scientific resources requires
addressing a number of challenges. The article discusses approaches to the integration of electronic
archives and describes the practical experience of integrating Ukrainian electronic archives using
the OAI-PMH protocol.
   The construction of an integrated environment for the aggregation of scientific resources
requires solving several problems. The article examines approaches to the integration of electronic
archives and describes the practical experience of integrating Ukrainian electronic archives using
the OAI-PMH protocol. The main protocols for the integration of electronic libraries are considered.
As the analysis has shown, since 2015, no significant exchange protocol alternative to OAI-PMH has
emerged. Approaches to the structural integration of electronic libraries have been analyzed, and a
comparative analysis of the functional capabilities of each software has been carried out. It has been
shown that VuFind is the most effective tool for the integration of DL.
   Future research will focus on the integration of full-text search and the improvement of
metadata quality [33] and display through semantic technologies such as ontologies and linked
data.

References
[1] V. O. Kopanieva, L. I. Kostenko, O. V. Novytskyi and V. A. Reznichenko, "The task of digital
     transformation of the scientific information environment," Problems in programming, vol. 1, pp.
     3-10, 2023.
[2] W. M. Beyene, "Metadata and universal access in digital library environments," Library Hi Tech,
     vol. 35, no. 2, pp. 210-221, 2017.
[3] Daniela Florescu, Ioana Manolescu, Donald Kossmann, "Answering XML queries over
     heterogeneous data sources," in 27th International Conference on Very Large Data Bases (VLDB
     2001).
[4] Leonidas Galanis, Yuan Wang, Shawn R. Jeffery, David J. DeWitt, "Locating data sources in
     large distributed systems," in 29th International Conference on Very Large Data Bases (VLDB
     2003), Morgan Kaufmann, 2003.
[5] M. Lenzerini, "Data integration: a theoretical perspective," in 21st ACM SIGMOD-SIGACT-
     SIGART Symposium on Principles of Database Systems (PODS 2002), New York, 2002.
[6] A. Y. Levy, "Combining artificial intelligence and database for data integration," in In Artificial
     Intelligence Today: Recent Trends and Developments, Berlin/Heidelberg, 1999.
[7] Alon Y. Levy, Anand Rajaraman, Joann J. Ordille., "Querying heterogeneous information
     sources using source descriptions," in 22nd International Conference on Very Large Databases,
     Bombay, India, 1996.
[8] D. Laney, "3D data management: Controlling data volume, velocity and variety," META group,
     2001.
[9] M. Schroeck, R. Shockley, J. Smart, D. Romero-Morales and P. Tufano, "Analytics: The Real-
     World Use of Big Data," IBM, 2012.
[10] I. C. Intel IT Center, "Centre. Big Data Analytics: Intel’s IT Manager Survey on How
     Organizations Are Using Big Data," Santa Clara, 2012.
[11] S. Suthaharan, "Big data classification: Problems and challenges in network intrusion prediction
     with machine learning.," ACM SIGMETRICS Performance Evaluation Review, vol. 41, no. 4, pp.
     70-73, 2014.
[12] M. Wilkinson, M. Dumontier, I. Aalbersberg, G. Appleton, M. Axton, А. Baak, N. Blomberg, J.
     Boiten, L. da Silva Santos, P. Bourne and J. Bouwman, "The FAIR Guiding Principles for
     scientific data management and stewardship," Scientific data, pp. 1-9, 2016.
[13] P. Ceravolo, A. Azzini, M. Angelini, T. Catarci, P. Cudré-Mauroux, E. Damiani, A. Mazak, M.
     Van Keulen, M. Jarrar, G. Santucci and K. Sattler, "Big data semantics," Journal on Data
     Semantics, vol. 7, no. 2, pp. 65-85, 2018.
[14] R. Amsler, "Application of Citation-based Automatic Classification," Austin, 1972.
[15] W. A. Woods, "What's in a link: Foundations for semantic networks.," Representation and
     understanding, pp. 35-82, 1975.
[16] S. Roy, B. Sutradhar and P. Das, "Large-scale Metadata Harvesting—Tools, Techniques and
     Challenges: A Case Study of National Digital Library (NDL)," World Digital Libraries: An
     International Journal., vol. 10, 2017.
[17] H. Van de Sompel, M. Nelson, C. Lagoze и S. Warner, «Resource harvesting within the OAI-
     PMH framework,» D-lib magazine, № 10, 2004.
[18] "The Open Archives Initiative Protocol for Metadata Harvesting Protocol Version 2.0 of 2002-
     06-14," [Online]. Available: http://www.openarchives.org /OAI/2.0/openarchivesprotocol.htm.
[19] R. Gartner, Metadata for digital libraries: state of the art and future directions, JISC, 2008.
[20] A. Getaneh, B. Stevens and P. Ross, "Towards a conceptual framework for user-driven semantic
     metadata interoperability in digital libraries: A social constructivist approach," New Library
     World, vol. 113, pp. 38-54, 2012.
[21] K. Лобузіна, "Сучасні підходи до інтеграції електронних інформаційних ресурсів
     бібліотек," Вісник Книжкової палати, vol. 12, pp. 24-28, 2012.
[22] О. М. Спірін, С. М. Іванова, О. В. Новицький, З. Савченко, В. А. Резніченко, А. В.
     Яцишин, Н. М. Андрійчук and В. Ткаченко, Електронні бібліотечні інформаційні системи
     наукових і навчальних закладів., Педагогічна преса, 2012.
[23] M. Agosti, N. Ferro and G. Silvello, "Digital library interoperability at high level of abstraction,"
     Future Generation Computer Systems, vol. 55, pp. 129-146, 2016.
[24] National Center for Biotechnology Information, U.S. National Library of Medicine, "Journal
     Article Tag Suite," 2024. [Online]. Available: https://jats.nlm.nih.gov/. [Accessed 10 2024].
[25] A. S. Lingam, "Federated search and discovery solutions," IP Indian J. Libr. Sci. Inf. Technol.,
     Vols. January-June 5, no. 1, pp. 39-42, 2020.
[26] C. Lagoze and H. Van de Sompel, "The Open Archives Initiative Protocol for Metadata
     Harvesting,"                      2015.                       [Online].                     Available:
     http://www.openarchives.org/OAI/openarchivesprotocol.html.
[27] C. Lagoze and H. Van de Sompel, "ORE User Guide - HTTP Implementation," [Online].
     Available: https://www.openarchives.org/ore/1.0/http. [Accessed 2023].
[28] C. Lagoze and H. Van de Sompel, "ORE User Guide - Resource Map Implementation in
     RDF/XML," [Online]. Available: https://www.openarchives.org/ore/1.0/rdfxml. [Accessed
     2023].
[29] В. А. Резніченко, О. В. Новицкий and Г. Ю. Проскудіна, "Інтеграція наукових електронних
     бібліотек на основі протоколу ОАІ-РМН," Проблеми програмування, no. 2, pp. 97-112,
     2007.
[30] Villanova University's Falvey Library., "VuFind® - Search. Discover. Share.," [Online].
     Available: https://vufind.org/. [Accessed 2023].
[31] D. Katz, R. LeVan and Y. Ziso, "Using authority data in VuFind," Code4Lib Journal, vol. 14,
     2011.
[32] Н. Yu and M. Young, "The impact of web search engines on subject searching in OPAC,"
     Information technology and libraries, vol. 4, no. 23, pp. 168-180, 2004.
[33] O. Novytskyi, G. Y. Proskudina, V. Reznichenko and O. Ovdiy, "Evaluation of the quality of
     digital libraries in the web environment," Software engineering, vol. 20, no. 4, 2014.