<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>V. Vysotska, P. Pukach, V. Lytvyn, D. Uhryn, Y. Ushenko, Z. Hu, Intelligent Analysis of
Ukrainian-language Tweets for Public Opinion Research based on NLP Methods and Machine
Learning Technology, International Journal of Modern Education and Computer
Science(IJMECS)</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.5815/ijmecs.2023.03.06</article-id>
      <title-group>
        <article-title>Formal Data Integration Models Development for Intelligent Electronic Commerce Systems</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Victoria Vysotska</string-name>
          <email>victoria.a.vysotska@lpnu.ua</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrii Berko</string-name>
          <email>andrii.y.berko@lpnu.ua</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lyubomyr Chyrun</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sofia Chyrun</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Olena Havrylyshyn</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Oksana Smirnova</string-name>
          <email>oksana.y.smirnova@lpnu.ua</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nataliia Sokulska</string-name>
          <email>natalya.sokulska@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Olena Sokhatska</string-name>
          <email>o.sokhatska@wunu.edu.ua</email>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Iryna Shakleina</string-name>
          <email>iryna.o.shakleina@lpnu.ua</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Hetman Petro Sahaidachnyi National Army Academy</institution>
          ,
          <addr-line>Heroes of Maidan street, 32, Lviv, 79026</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Ivan Franko National University of Lviv</institution>
          ,
          <addr-line>University Street, 1, Lviv, 79000</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Lviv Polytechnic National University</institution>
          ,
          <addr-line>Stepan Bandera Street, 12, Lviv, 79013</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Ukrainian Academy of Printing</institution>
          ,
          <addr-line>Pidholosko St., 19, Lviv, 79020</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>West Ukrainian National University</institution>
          ,
          <addr-line>Lvivska Street, 11, Ternopil, 46004</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>15</volume>
      <issue>3</issue>
      <fpage>0000</fpage>
      <lpage>0001</lpage>
      <abstract>
        <p>The problem of creation and application of methods and means of information technologies of electronic commerce for various subject areas and applications has been studied, as the problems of developing mathematical models, solution methods and instrumental means for the integration of information resources and the functioning of intelligent electronic commerce systems with the use of effective intelligent models have been solved. Processes of modelling and design of business analytics tools for processing heterogeneous information resources based on ontologies are described. To solve the problem, several scientific tasks were performed, in particular, a classification of intelligent electronic commerce systems and means of processing heterogeneous distributed information resources of business analytics was proposed, a formal model of intelligent electronic commerce systems is developed using ontologies, its components, a structural model of information resources, methods and algorithms for designing intelligent electronic commerce systems based on the apparatus of ontologies and integration of information resources.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Intelligent system</kwd>
        <kwd>electronic commerce</kwd>
        <kwd>Data Integration</kwd>
        <kwd>system model</kwd>
        <kwd>process model 1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The processes of data integration have a fairly wide scope of practical applications. In particular,
in areas such as construction of DS of various types and directions, development of corporate
management systems, information Web systems, electronic business systems, computer
monitoring, etc. The information resources of such systems provide for the simultaneous use of a
significant number of various forms, structures, content, methods of presentation and application
of data [
        <xref ref-type="bibr" rid="ref1 ref2">1-2</xref>
        ]. The purpose of developing the method of multi-level data integration is to build and
justify a single generalized approach to solving the given task and determining ways of
implementation that will ensure its interoperability and invariance to the nature, content,
specificity, and order of application of the integrated data. This is especially important in
operational integration processes, in which these data properties are often not predetermined
and may change during the integration procedures themselves. The basis for solving the
problems of this section is the formal presentation of data as a system, the syntax, structure and
semantics of which elements are described with the help of special tools suitable for software
perception and processing. The main task of data integration is the formation of a complete and
consistent output set based on a set of disparate input data obtained from various sources. To
achieve the final goal of integration, it is necessary to ensure a coordinated combination in the
single formation of their syntax, structure and semantics [
        <xref ref-type="bibr" rid="ref3 ref4">3-4</xref>
        ]. In the course of solving this kind
of problem, several problematic moments arise, which are manifested in various kinds of
conflicts, and contradictions due to inconsistencies of input local data [
        <xref ref-type="bibr" rid="ref5 ref6">5-6</xref>
        ]. At the level of data
syntax integration, the following contradictions arise [
        <xref ref-type="bibr" rid="ref1 ref2">1-2</xref>
        ]: ambiguity or contradiction of
alphabets, mismatch of data types and formats, and mismatch of syntactic constraints. At the level
of integration of data structures, the following are typical contradictions: inconsistency in
methods of defining data units, contradictions in the types and methods of building connections,
and a variety of ways to organize data [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. The semantic component of the integration process is
one of the most important and complex, since the problems of syntax and structure, in general,
are solved at the technical and technological levels. Formation of an agreed interpretation of
integrated data is impossible without human participation, as well as the application of methods
and means of intelligent data processing. At the level of integration of semantics, conflict
situations [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] arise as a result of the following factors:
 contradictions in the definition of concepts,
 ambiguity or different readings of names,
 use of incompatible metrics when forming data values,
 contradictions in defining relationships between data,
 contradictions of limitations and axioms of data interpretation,
 ambiguous interpretation of data values.
      </p>
      <p>Eliminating the listed contradictions and conflicts between input data is one of the tasks of the
data integration method. The multi-level data integration method is based on the multi-level data
model developed in the previous section and involves the decomposition of the overall process
into sub-processes of value integration, data syntax, structure and semantics. A key element of
this approach to integration processes is the possibility of their implementation at the level of
data meta-schemas, which allows to reduce the number of references to, in fact, data, the volumes
of which can be significant. Due to this execution of data integration procedures, they are
transferred to the meta-level, operating instead of data with their formalized description. Similar
principles of replacing operations on information resources with operations on metadata that
specify them are used in the concept of the "semantic web", which is part of the general concept
of Web 2.0, data spaces [8] and DS of the second generation [9].</p>
      <p>
        The purpose of the method of multi-level data integration is to determine the principles,
composition and content of actions for the formation of the information resource of open
information systems and the order of their implementation. Since the object of application of the
method is information resources, it is advisable to organize the process of its development
according to a set of requirements [9-10], which are applied to the design processes of
information systems. The most acceptable is the application of the popular FURPS+ requirements
model [11] defined according to RUP (Rational Unified Process) specifications and IEEE Std
1233a-1998 [12], IEEE Std 610.12-1990 [13] standards, which today are typical in the field of
creating information systems and their components. Such a model provides for the formulation
of basic and additional requirements for the final result of the development process. The main
requirements that must be met by the method of multi-level data integration, according to the
chosen approach, form a set that will be formulated as follows [
        <xref ref-type="bibr" rid="ref1 ref2">1-2</xref>
        ].
      </p>
      <p>1. Functionality is compliance of the functionality of the multi-level data integration method
with the requirements and needs of users of the final result.
2. Usability is the possibility of applying the method for implementation using an open
information system.
3. Reliability is the ability of the method to provide the appropriate level of quality indicators
of the results under the specified conditions during the time of its application.
4. Performance is the ratio between the level of costs for the implementation of the actions
provided for within the method and the weight of the results obtained.
5. Supportability is the ability of the method to be applied in all situations and conditions in
which the means of the corresponding information system function.</p>
      <p>
        In addition to the set of basic requirements, the FURPS+ model provides for the formulation
of additional requirements, which, unlike the basic set of FURPS requirements, are not unified
and are formulated to reflect the specifics of the area and subject of the application. For the
method of multi-level data integration, focused on open information systems, additional
requirements are as follows [
        <xref ref-type="bibr" rid="ref1 ref2">1-2</xref>
        ].
      </p>
      <p>6. Portability is the ability to move the tools that implement this method from one
application environment to another without rebuilding them.
7. Interoperability is the ability to jointly apply the method and means that implement it with
other methods and means of forming IRP (information resource processing) of open systems.
8. Unification is the use of typical concepts, objects and tools and the formation of results,
by the uniform requirements of IS (intelligent system).</p>
      <p>The fulfilment of such a set of requirements aims to ensure the appropriate level of quality of
the method being developed and advantages over other known methods of data integration.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related works</title>
      <p>
        The problem of data syntax integration (syntactic integration) is fundamental to the
integration of other components of their general description. Solving the problems of building a
generalized structure and semantics of data is possible only based on a single agreed system of
notation. The concept of data syntax itself is complex and takes into account various aspects of its
representation in documents, DB, DS data repositories, etc. [14]. Taking this into account, the data
syntax is presented as a combination of three components G=&lt;A, T, R&gt;, where A is an alphabet, T
is a set of data types, and R is a set of syntactic restrictions [
        <xref ref-type="bibr" rid="ref1 ref2">1-2, 15-18</xref>
        ]. An alphabet defines a set
of symbols that are used to represent data values in a defined environment. As a rule, the alphabet
consists of letters, numbers, and special and service symbols. However, the definition of the
alphabet is influenced, in particular, by such factors as the localization of the data processing
environment to the language of the users, the nature of the tasks for which the data are used, the
peculiarities of the processes of their storage, transmission and processing, the specifics of
interpretation and the application of various data values. Along with traditional means of marking
data, modern systems widely use graphics, sound, multimedia and other elements for their
display and processing, as well as data of complex and complex types, streaming and active data,
which creates additional difficulties in producing a single, consistent presentation of data [
        <xref ref-type="bibr" rid="ref1 ref2">1-2</xref>
        ].
The concept of data type is defined as the result of the classification of values according to the
methods of representation and processing [15-18]. Today, along with such classic types as
numerical, symbolic, logical, date-time, etc., specific types of data are widely used, which reflect
the peculiarities of their content, processing and application. These are, in particular, such scalar
types as "hyperlink", "currency", "object", "locator" and other, complex (aggregate) types
"array", "record", "set", "XML document " etc., object types, and user-defined data types. Such a
variety of data types, on the one hand, creates additional opportunities for the image and
processing of information resources, on the other hand, it complicates the means of supporting
the data storage environment, the procedures for their joint application, transformation and
unification. Constraints, as an element of data syntax, are used to unify forms of data presentation
and create values adequate to the concepts and values they represent. Syntax restrictions are set
in the form of quantitative indicators, dimensions, formats, templates, rules for forming values,
defining a subset of permissible characters, etc. Such restrictions can be defined both at the IS/IT
(information technology) level of data support and at the user level. Therefore, it is advisable to
decompose the data syntax integration problem into the problems of alphabet integration, type
integration, and constraint integration. The ratio of these tasks and the results of their execution
are presented in Fig. 1.
      </p>
      <p>Integration of data</p>
      <p>syntax
Integration of
constraints
Integration of types
Integration of the</p>
      <p>alphabet
Integration of values
An integrated set of</p>
      <p>constraints
An integrated set of</p>
      <p>types
Integrated alphabet</p>
      <p>Integrated set of
values</p>
      <p>
        According to this scheme, the syntax of the image of the values of the integrated data set GI is
presented as a combination of three components GI =&lt;AI, TI, RI&gt;, where AI =IA(A1, A2, …, AN) is the
alphabet of the integrated data set, formed by integrating the input alphabets data sets A1, A2, …,
AN ; TI =IT(T1,T2,…,TN) is the set of data types used in the integrated set, obtained as a result of the
integration of the data types defined for the input data; RI =IR(R1,R2,…,RN) is the set of constraints
of the integrated data set formed by the integration of the constraints applied to the input data;
IA, IT, IR are integration operators, respectively, of alphabets, data types, and constraints. Each of
these operators describes the mapping, respectively, of IA is sets of input alphabets into the output
global alphabet of the integrated data set, IT is sets of local input sets of data types into the output
global set of data types of the integrated set, IR is sets of local input sets of syntactic constraints
into the output global a set of syntactic restrictions of data of an integrated set [
        <xref ref-type="bibr" rid="ref1 ref2">1-2, 15-18</xref>
        ].
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Models and methods</title>
      <p>3.1. Basic principles of the extended data integration model</p>
      <p>
        Further development of the concept of modelling data integration processes is possible due to
the transition in the formal model from the concept of a scheme as an object of integration to the
concept of a data set. Each data set is a combination of a scheme, as some formalized description
of the composition and structure of data and a set of values (constants) formed according to the
requirements of the scheme. In this way, the formal objects of the model are a set of input (local)
data sets, an output (global) set of integrated data and a mapping that establishes correspondence
between the elements of the input and output sets (Fig. 2a). Formally, such a model is presented
as a triple of the form [
        <xref ref-type="bibr" rid="ref1 ref2">1-2</xref>
        ]: &lt;DSL, Map(DSL, DSI), DSI&gt;, where DSL={&lt;Di, Σi &gt; | i=1,…N} is a set of
local input data sets; Σi is the data scheme of the i-th input set is made in terms of the input scheme
description language LL, Di is a set of values (constants) formed based on a set of characters of
the input alphabet AL; DSI=&lt;DI, ΣI&gt; is global output set of integrated data; ΣI is the scheme of the
global set of integrated data is made in terms of the description language of the original schemes
LI, DI is the set of values of the original data set given by the symbols of the original alphabet AI;
Map(DSL, DSI) is mapping of local input data into a global output set of integrated data [
        <xref ref-type="bibr" rid="ref1 ref2">1-2</xref>
        ]. The
fundamental difference between this model and the formal model of M. Lenzerini is the concept
of a global set of integrated data as a result of the integration process. At the same time, this set
can be formed both by moving the values of the input data into the global environment and by
mapping through virtual structures and data elements. In general, the proposed model
corresponds to the real processes of integration to a greater extent than the formal model. Using
such a model, it is possible to formulate a sufficiently accurate and detailed formal description of
the main typical methods of data integration, such as consolidation, federalization, replication,
hybrid integration and collage [
        <xref ref-type="bibr" rid="ref1 ref2">1-2</xref>
        ].
      </p>
      <p>Global scheme of integrated data
D2 2
.
.</p>
      <p>.</p>
      <p>DN N
.
.</p>
      <p>.</p>
      <p>* *</p>
      <p>D N N
L
T
E</p>
      <p>DI I</p>
      <p>Data source 1 Data source 2 Data source N
Figure 2: According to the improved model and data consolidation</p>
      <sec id="sec-3-1">
        <title>3.2. Modelling the data consolidation process</title>
        <p>
          A feature of the data consolidation method is the application of data extraction, transformation
and loading procedures as the basis of the data integration process. The result of consolidation is
a global set of integrated data, which has its scheme, which summarizes the composition and
content of the schemes of the input sets. A description of the process of data consolidation
according to the proposed generalized model is given in Fig. 2b [
          <xref ref-type="bibr" rid="ref1 ref2">1-2</xref>
          ]. The formal model of the
data consolidation process has the form of a tuple &lt; {&lt;Di, Σi &gt; | i=1,…, N}, ETL(&lt;Di, Σi &gt;) | i=1,…, N,
&lt;DI, ΣI&gt; &gt;, where &lt;{&lt;Di, Σi &gt; | i=1,…N} is a set of input data sets, each of which is given by a scheme
Σi and a set of values Di; &lt;DI, ΣI&gt; is a global set of integrated data, with the scheme ΣI and a set of
DI values; ETL(&lt;Di, Σi&gt;) is display of input data sets into the output by applying extraction
procedures, loading conversion [
          <xref ref-type="bibr" rid="ref1 ref2">1-2</xref>
          ]. The key element of such a model is a mapping, which
transforms each ith input set of the form &lt;Di, Σi &gt; into an intermediate data set of the form &lt;D*i,
Σ*i&gt;. The data set formed as a result of such a transformation differs from the initial one, primarily
the fact that its composition, scheme and format are built by the requirements of a global
integrated data storage environment. The next step is to move the intermediate data set to the
global environment and merge the set of its values with the values set of the integrated data set.
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>3.3. Modelling the data federalization process</title>
        <p>
          The method of data federalization differs in the way of forming a set of integrated values (Fig.
3a) [
          <xref ref-type="bibr" rid="ref1 ref2">1-2</xref>
          ]. Unlike consolidation, this method involves the formation of an integrated data set as
some virtual image based on a set of local data sets. When accessing the integrated data, the
corresponding image elements are implemented by substituting real values obtained from local
sources. In this way, the integration process is implemented only at the scheme level, using as
values the data placed at the local level. A formal model of data federation can be represented as
an expression of the type
        </p>
        <p>
          &lt;{&lt;Di,Σi &gt; | i=1,…, N}, View(&lt;Di,Σi &gt;) | i=1,…, N, &lt;DI,ΣI&gt;&gt;, (1)
where &lt; {&lt;Di, Σi &gt; | i=1,…N} is a set of input data sets, each of which is given by a scheme Σi and
a set of values Di; &lt;DI, ΣI&gt; is a global set of integrated data, with the scheme ΣI and a virtual set of
DI values; View(&lt;Di, Σi &gt;) is mapping of the scheme of the input data set to the global scheme of
integrated data. The key principle of such mapping is the formation of a description of a subset of
data of the local input set in terms and composition that meets the requirements of the global
scheme, while the set of values described by the new scheme Σ*i is a subset of the input local set
of values &lt;Di, Σi&gt;. The result of the mapping is a global scheme of integrated data, formed as a
union of mappings of local schemes [
          <xref ref-type="bibr" rid="ref1 ref2">1-2</xref>
          ]: ΣI= Σ*1 Σ*2  …  Σ*N, where N is the number of input
local data sets. The set of values of the global initial set of integrated data is formed as a union of
the set of projections of local data sets {D*i | i=1,…N}, built according to the set of schemes {Σ*i |
i=1,…, N}, each of which is formed by displaying View(Σi, Σ*i): DI= D*1  D*2  …  D*N.
...
        </p>
        <p>*
RM M</p>
        <p>DI
I</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.4. Modelling the data replication process</title>
        <p>
          Data integration using the replication method involves the formation of a certain mapping
(projection) of the local input data set according to a given mechanism, similar to the
federalization method. The fundamental difference is that the result of displaying input data is
not a virtual set of values, but some intermediate set of data that has its physical image formed
according to some scheme, as in the case of data consolidation. But at the same time, the data set
created in this way - a replica, cannot be moved to a specially defined storage environment. An
advantageous global set of integrated data is formed as a union of a set of replicas. The general
scheme of data integration by the replication method is shown in Fig. 3b [
          <xref ref-type="bibr" rid="ref1 ref2">1-2</xref>
          ]. The formal model
of the data integration process using the replication method can be described as follows [
          <xref ref-type="bibr" rid="ref1 ref2">1-2</xref>
          ]:
&lt;{&lt;Di,Σi &gt;|i=1,…, N}, Replicate(&lt;Di,Σi &gt;) | i=1,…, N, &lt;DI,ΣI&gt;&gt;, where {&lt;Di ,Σi &gt; | i=1,…,N} is a set of
input data sets, each of which is given by a scheme Σi and a set of values Di; N is the number of
incoming local data sets; Replicate(&lt;Di,Σi&gt;) is display of the input data set, which forms a new set
of values – a replica, which is a subset of the set of values of this set, formed according to the
replica scheme, the replica scheme is a subset of the global integrated data scheme; j=1,…,M,
where M is the number of replicas, the number of which may differ from the number of data
sources, since one or more replicas can be formed on the basis of one input local set, the result of
mapping Replicate(&lt;Di,Σi &gt;) is a set data of the form &lt;Rj,Σ*j&gt;, where Σ*j is a replica scheme, Rj is a
set of values; &lt;DI,ΣI&gt; is a global set of integrated data, with a scheme ΣI and a set of DI values, while
the scheme ΣI is a union of the schemes of all replicas ΣI= Σ*1 Σ*2  …  Σ*M, and a set of DI values
by combining sets of replica values – DI= R1  R2  …  RM.
        </p>
      </sec>
      <sec id="sec-3-4">
        <title>3.5. Modeling the hybrid data integration process</title>
        <p>
          A feature of data integration using the hybrid method (Fig. 4) is the combination of the
possibilities of the three methods described above – consolidation, federalization and replication
– in one process. In this case, the global initial set of integrated data is formed as a heterogeneous
entity that combines several segments, each of which is formed based on different methods and
technologies [
          <xref ref-type="bibr" rid="ref1 ref2">1-2</xref>
          ]. In general, the hybrid integration model can be described by a tuple of the
form &lt;{&lt;Di,Σi &gt; | i=1,…N}, Mapi(&lt;Di,Σi &gt;) | i=1,…, N), &lt;DI,ΣI&gt;&gt;, where &lt;{&lt;Di ,Σi&gt; | i=1,…,N} is a set of
input data sets, each of which is given by the scheme Σi and a set of values Di , N is the total number
of input local data sets, the set of input local data sets is divided into three subsets, according to
the integration methods applied to them; &lt;DС,ΣС&gt; is input local data sets to which the data
consolidation method is applied; &lt;DR,ΣR&gt; is input local data sets to which the data replication
method is applied; &lt;DF,ΣF&gt; is input local data sets to which the data federalization method is
applied; Mapi(Di, DI) is mapping of the input local data set to the global set of integrated data, the
type of mapping is different for different data sets, depending on the integration methods applied
to it – consolidation, federalization or replication; &lt;DI,ΣI&gt; is a global set of integrated data, with a
ΣI scheme and a set of DI values, while the ΣI scheme is a union of schemes formed by different
integration methods ΣI =Σ*С Σ*F  Σ*R, where Σ*С is a data scheme formed as a result of the
consolidation of input local data, Σ*F is a data scheme formed by federalization, Σ*R is replication,
the set of values of the global initial set of integrated data is formed as a union of three segments
[
          <xref ref-type="bibr" rid="ref1 ref2">1-2</xref>
          ]: DI =D*С  D*F  R*, where D*С is the set of values formed as a result of consolidation of input
local data, D*F is the set of values formed by federalization, R* is replication.
        </p>
        <p>DC
C</p>
        <p>L
T
E</p>
        <p>D*C *C
R
DR
. . .</p>
        <p>F
DF</p>
        <p>D*F *F</p>
        <p>DC
DI</p>
        <p>*F</p>
        <sec id="sec-3-4-1">
          <title>Replication</title>
          <p>R* *R I</p>
        </sec>
      </sec>
      <sec id="sec-3-5">
        <title>3.6. Modelling the data collage process</title>
        <p>
          Collage (mashup), as a method of integration, is most often used in Web-systems to combine
in a single presentation of data received from different sources, different in form, structure, and
methods of representation, but combined by a common content/application. The peculiarity of
collage is the absence of a permanent scheme of integrated data and the dynamic formation of a
set of values with each access to resources of this type. At the same time, the initial data are
combined in various ways, forming, as a result, arbitrarily structured hybrid data. The general
scheme of the data collage process is shown in Fig. 5a [
          <xref ref-type="bibr" rid="ref1 ref2">1-2</xref>
          ].
        </p>
        <p>D1 1
D2 2
.
.</p>
        <p>.</p>
        <p>DN N
r
e
v
r
e
s
p
u
h
s
a
M</p>
        <p>
          {&lt;Di,Σi &gt; | i=1,…N}, Mashupi(&lt;Di,Σi &gt;), &lt;DI,ΣI&gt; &gt;, (2)
where {&lt;Di, Σi &gt; | i=1,…N} is the input local data set, with scheme Σi and set of values Di; &lt;DI, ΣI&gt;
is the initial global set of integrated data; Mashupi(&lt;Di, Σi&gt;) is a mapping that forms a data collage
element for further combining parts of input local data sets into a single view. In the collage
process, some subset of D*i values is selected from each input local set, which is described by the
scheme Σ*i. From these parts, by combining and superimposing different types of data and
forming a global scheme as a combination of schemes, a single integrated data set &lt;DI, ΣI&gt; is
formed for presentation to the user. The difference between integration by collage and other
methods is the absence of physical storage of integration results and the dynamic formation of a
global scheme of integrated data upon user request [
          <xref ref-type="bibr" rid="ref1 ref2">1-2</xref>
          ].
        </p>
        <p>3.7. Formal modelling of data integration processes and results</p>
        <p>
          The analysis of the results of modelling data integration processes using various methods and
methods using an extended formal model allows the following conclusions to be drawn [
          <xref ref-type="bibr" rid="ref1 ref2">1-2</xref>
          ]:
 the extended formal model of data integration can be applied to model resource-centric
and schema-centric data integration by methods of consolidation, federalization, replication,
hybrid integration and collage. Therefore, the proposed model is invariant to the methods and
paradigms of data integration, which allows us to conclude about its universality;
 both the data itself in the form of a set of values (constants) and their formalized
description – a scheme – appear in the integration processes. Integration involves performing
a series of isomorphic transformations over the input schemas to form a global output schema
of integrated data and transformations of sets of values of input data to form a set of values of
the output set of integrated data;
 in the process of integration, operations of moving, reformatting, selecting, projecting,
combining, superimposing, etc. are performed on the input data. as a result, new sets of data
are created, which differ from the input ones in composition, content, structure, presentation
and methods of application;
 the listed features of integration processes are common to various integration methods
and paradigms, which allows us to conclude the possibility of creating a single generalized
apparatus for describing data integration processes, independent of integration technologies,
subject area, content, purpose and order of application of integrated data.
        </p>
        <p>
          The general conclusion regarding the modelling of data integration processes is that as a result
of integration, new data values, new forms and presentation formats, new data structures, new
content and new purpose of data are created [
          <xref ref-type="bibr" rid="ref1 ref2">1-2</xref>
          ]. So, data integration has technical, syntactic,
structural, semantic and pragmatic aspects. Accordingly, each of these aspects involves the use of
its methods and means of data description in integration processes, which allows dividing the
overall integration process into several sub-processes that implement one of the
abovementioned aspects. This is reflected in the generalized model of data integration processes, which
is proposed to be called a multi-level formal model of integration.
        </p>
      </sec>
      <sec id="sec-3-6">
        <title>3.8. Multilevel data integration model</title>
        <p>
          The results of the analysis of formal models of data integration using various methods show
that in the process of integration, significant transformations of the composition, content and
form of data occur. This means generating, based on input sets, a set of new final data that have
fundamentally different properties. This creates the basis for further development and
improvement of the formal model of the data integration process by introducing into its
composition elements that describe the main properties of the data and the order of their change.
According to the concept of presenting data as a formal system, the data form some formal
language that is used to denote a set of values and concepts from a certain SA in the environment
of the information system [
          <xref ref-type="bibr" rid="ref1 ref2">1-2</xref>
          ]. The basis of the construction of language structures is a certain
set of symbols - the alphabet. Mandatory and integral properties of data in such a data
representation are their syntax, semantics, and structure. At the same time, syntax is used to
determine the order of presentation of lexical constructions (constants), for the presentation of
real values, and the order of formation of new lexical units based on given ones. Semantics
provides an ordered and unified description of the ways of interpreting data, that is, it connects
them with the actual values that take place in the subject area, forming, due to this, the content of
the data and their pragmatics. With the help of the structure, the order of formation of data units,
their combination and arrangement is described. The structure, in turn, determines not only the
order of presentation and storage of data but also the methods of its processing and application
[
          <xref ref-type="bibr" rid="ref1 ref2">1-2</xref>
          ]. In the general case, the definition of an arbitrary data set DS forms a system of the form
DS=&lt;D, G, S, H &gt;, where D is a set of values that represent a set of concepts of some subject area, G
is a formalized representation of the data syntax, S is a formalized description of the structure
data, H is a formalized presentation of data semantics. In this way, the formal presentation of a
data set as a tuple of the form &lt;D, Σ&gt;, where D is a set of values, Σ is a data scheme, is changed to
a tuple of the form &lt;D,&gt;, where =&lt;G, S, H&gt;, formal presentation of the syntax, structure and
semantics of the data in this set, which, in the future, we will call its meta-schema. A meta-schema
is an extension of the concept of a scheme by supplementing the description of the structure and
constraints of data with a formalized description of their syntax and semantics. The introduction
of the concept of a meta-scheme makes it possible to build a much broader and more detailed
description of data properties in integration processes, compared to a scheme. In general, the
process of data integration involves several actions related to their transformation and the
formation of new data based on the initial ones. It is considered a sequence of actions involving
matching, transformation, merging and filtering of data, and aims to form a final set of DS data
based on a set of initial sets, it is formally represented by an expression of the form [
          <xref ref-type="bibr" rid="ref1 ref2">1-2</xref>
          ]:
        </p>
        <p>
          DS=I(DS1, DS2, …, DSN), (3)
where I is the data integration operator, DS1, DS2, …, DSN is the set of input initial data sets, and
N is the number of data sets participating in the integration process. In general, such data sets
may contain repeated values, i.e. [
          <xref ref-type="bibr" rid="ref1 ref2">1-2</xref>
          ]:
        </p>
        <p>D1∩ D2 ∩ … ∩ DN ≠. (4)</p>
        <p>
          Given the data model, which is based on the specification of their syntax, semantics and
structure DS=&lt;D,&gt;=&lt;D, G, S, H&gt;, the formal definition of the integration process can be reduced
to actions on these components, replacing the DSI value with a detailed description all
components of data definition as follows [
          <xref ref-type="bibr" rid="ref1 ref2">1-2</xref>
          ]:
        </p>
        <p>&lt;DI,I&gt;=&lt;DI, GI, SI, HI&gt;=I(&lt;Di,i&gt; | i=1,…N)= (5)
=I(&lt;D1, G1, S1, H1&gt;,&lt;D2, G2, S2, H2&gt;, …, &lt;DN, GN, SN, HN&gt;),
where &lt;Di, Gi, Si, Hi&gt;, i=1,2, …, N is the detailed formal representation of the ith data set.</p>
        <p>
          In this way, the problem of data integration can be decomposed into separate problems of data
value integration, syntax integration, structure integration, and semantic integration. The general
data integration operator I is presented as a combination I=&lt;ID, IG, IS, IH&gt;, where IV is the value
integration operator, IG is the syntax integration operator, IS is the data structure integration
operator, and IH is the semantics integration operator. At the same time, the integration process
will be decomposed into corresponding sub-processes, which can be described by a formal
scheme of the form [
          <xref ref-type="bibr" rid="ref1 ref2">1-2</xref>
          ]:
        </p>
        <p>
          &lt;D, G, H, S&gt; =&lt;ID(D1, D2, …, DN), IG(G1, G2, …, GN), IS(S1, S2, …, SN), IH(H1, H2, …, HN)&gt;. (6)
The mutual relationship of these processes and their classification by levels are shown in Fig.
5b. According to such a scheme, each subsequent level of integration is based on the results of the
previous one. Thus, the semantic integration of data is possible only after the integration of their
structure, which, in turn, requires the construction of an integrated syntax that defines the
methods of data representation and the integrated set [
          <xref ref-type="bibr" rid="ref1 ref2">1-2</xref>
          ]. The presentation of data in
integration processes as a formal system allows to develop and improve the theoretical
conceptual foundations of data integration due to a higher level of abstraction and the possibility
of creating integration models that do not depend on the nature, content, subject area, methods
and technologies. As a result of the study of formal models of data integration using the methods
of consolidation, federalization, replication, collage, and the hybrid method, it was found that the
basic principles and concepts are common to all methods, which makes it possible to build a
unified approach and method to data integration that will generalize the methods known today.
In the process of integration, not just a mechanical combination of data is performed, but the
formation of new data, which has fundamentally new properties, differs from the input data in
syntax, structure, semantics and the order of application. This makes it possible to distinguish the
processes of integration of data values, their syntax, structure and semantics [
          <xref ref-type="bibr" rid="ref1 ref2">1-2</xref>
          ]. The model
developed in the way described above defines and substantiates the possibility of creating a
universal method of data integration, which summarizes the capabilities of currently known
approaches, and also creates an opportunity to move the integration processes from the
procedures for processing the actual data and their schemes to the procedures for manipulating
metadata that describe the properties and specifics of the set data, which is the object of
integration.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments, results and discussion</title>
      <sec id="sec-4-1">
        <title>4.1. Syntactic integration of data</title>
      </sec>
      <sec id="sec-4-2">
        <title>4.1.1. Integration of alphabets</title>
        <p>
          The integration of alphabets at the stage of designing a unified integrated data processing
environment consists in creating a consistent set of symbols for representing values from the
resulting data set - the integrated AI alphabet, such that for each symbol of the input alphabet Ai,
which is used to represent the value of the input data set Di (i=1,2, …, N), there is a unique mapping
αi: Ai→AI, which matches each symbol of the input alphabet of the ith data set σi(Ai) with a symbol
of the integrated alphabet – σ(AI). The following ratios of input and integrated data alphabets are
possible (Fig. 6) [
          <xref ref-type="bibr" rid="ref1 ref2">1-2</xref>
          ]:
 the input alphabet is a subset of the integrated alphabet and has no intersections with
other input alphabets (A1);
 the input alphabet is a subset of the integrated alphabet and has a non-empty intersection
with another alphabet that is a subset of the integrated alphabet (A5);
 the input alphabet is a subset of the integrated alphabet and has a non-empty intersection
with another alphabet that has a partial intersection with the integrated alphabet (A2);
 the input alphabet has a non-empty intersection with the integrated alphabet and an
alphabet that is a subset of the integrated alphabet (A3);
 the input alphabet is not a subset of the integrated alphabet and has a non-empty
intersection with another input alphabet, which, in turn, has a partial intersection with the
integrated alphabet (A4);
 the input alphabet is not a subset of the integrated alphabet and does not have non-empty
intersections with another input alphabet (A6);
 the input alphabet is not a subset of the integrated alphabet, but at the same time has a
non-empty intersection with other input alphabets (A7, A8).
        </p>
        <p>Integrated alphabet AI</p>
        <p>Input
alphabet A1</p>
        <p>Input
alphabet A2</p>
        <p>Input
alphabet A4</p>
        <p>Input
alphabet A7</p>
        <p>Input
alphabet A5</p>
        <p>Input
alphabet A8</p>
        <p>Input
alphabet A3</p>
        <p>Input
alphabet A6</p>
        <p>
          We present the process of building an integrated alphabet as a sequence of solving
interconnected problems according to the following scheme [
          <xref ref-type="bibr" rid="ref1 ref2">1-2, 15-18</xref>
          ].
        </p>
        <p>
          1. Let A0 be some initial set of symbols of the integrated alphabet.
2. For each of the input alphabets Ai, i=1,2,…, N, the ratio Ai  A0 is checked. If it is fulfilled,
then all characters of the alphabet Ai used to represent the values of the data set Di are also
acceptable for representing the corresponding values in the integrated data set D. Therefore,
it can be assumed that the input data can be included in the integrated set without changing
the form of their presentation.
3. In this case, the phenomenon of polysemy of symbols is possible. We will call polysemous
symbols that have the same shape and reflect different meanings, for example, the Ukrainian
letter "І", the Latin letter "I", the Roman numeral "І" (1), the Latin letters A-F, which are used
to represent both letters and numbers in the 16th number system, etc. Such a phenomenon
may cause an ambiguous interpretation of data values and their content in the future. The
problem of polysemic characters has the following solutions:
 banning the use of the same symbols to denote different concepts - this method involves
defining a single image for all symbols that have the same shape; this option for solving the
problem of polysemic symbols is possible in cases when they are used for formal meanings
that do not have additional (phonetic, lexical or substantive) interpretation (for example, the
same type of use of Latin and Cyrillic letters that match the spelling in car registration
numbers); in this case, problems are possible when interpreting, reading or phonetizing data
values;
 the use of polysemic symbols without restrictions - for each of the symbols that have the
same form, they retain their method of application; in this case, the problem of polysemy of
symbols of the integrated alphabet is not solved in the process of integration, but is transferred
to the level of data application;
 replacement of identically shaped symbols with an alternative image - transliteration; this
transformation allows you to eliminate the polysemy of characters without narrowing the
possibilities of data presentation.
4. If the set of characters of the input alphabet is not a subset of the integrated alphabet – Ai
 A0, then it is divided into two subsets – Ai1=AiA0 and Ai2=Ai\A0. The first includes symbols
that are elements of the integrated alphabet, the process of integration in this case is described
above. The set Ai2 includes characters of the input alphabet that are not elements in the current
state of the integrated alphabet A0. In such a situation, character polymorphism is possible. We
will consider polymorphic symbols to be symbols that differ in image form and are used to
denote the same concepts. For example, upper/lower case letters in words represent the same
sounds, numerical values can be represented in different number systems, using Arabic, Latin
numbers or letters, etc. The appearance of polymorphic symbols in alphabets is a possible
cause of ambiguous perception and interpretation of data values during their processing. As
for solving the problem of processing polymorphic symbols, the following solutions are
possible [
          <xref ref-type="bibr" rid="ref1 ref2">1-2, 15-18</xref>
          ]:
 replacing polymorphic symbols with homomorphic images, i.e. bringing symbols of
different shapes to a single form, for example, using only uppercase or lowercase letters, only
Arabic numerals, which replace similar Roman numerals, etc.;
 parallel application of polymorphic images of synonymous symbols without restrictions,
in which case their interpretation will depend on the context and application of data values;
 creating your interpretation and rules for using polymorphic symbols - this path requires
a detailed analysis of their properties, but allows you to significantly expand the capabilities
of the integrated alphabet in terms of displaying data, for example - proper names begin with
capital letters, operators or operations are denoted by special symbols, Arabic numerals are
used to represent quantitative values, and Roman – ordinal, etc.
        </p>
        <p>Regarding the set of symbols Ai2 = Ai \ A0, the following options are possible
 prohibiting the use of symbols from the Ai2 set to represent integrated data;
 expansion of the integrated alphabet due to the inclusion of a set of symbols Ai2 in its
structure, with the formation of the next version A1 = A0Ai2;
 transliteration – replacing symbols that are not elements of the integrated alphabet with
symbols from the A0 alphabet.
5. As a result of iterative repetition of the described sequence of actions, an integrated
alphabet AI=IA(A1, A2, …, AN) is formed, which defines a set of symbols for presenting data
values in an integrated set.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.1.2. Integration of data types</title>
        <p>
          The integration of data types in the construction of the output integrated set consists in the
formation of a set of data types TI, such that for each of the types applied in the input data sets
there is a mapping τ: Ti → T, which establishes a one-to-one correspondence between the data
types t(Ti) applied in the input set Di (i=1,2, …, N) and t(TI) are the data types used in the
integrated data set DI. The mutual relationship of different sets of data types in the process of
integration is shown in Fig. 7. As can be seen from the diagram, similar to the process of
integration of alphabets, input sets of data types may or may not have full or partial intersections
with the integrated, and may or may not have mutual intersections with each other. The process
of forming an integrated set of types involves the sequential execution of such actions [
          <xref ref-type="bibr" rid="ref1 ref2">1-2,
1518</xref>
          ].
        </p>
        <p>
          1. Let T0={ t1(T0), t2(T0), …, tn(T0)} be some initial set of types that are defined in the integrated
data set forming the initial state of the TI type set.
2. For each set of input data Di (i=1,2, …, N), check the ratio Ti  T0, which determines the
agreement of the types of the input set with the types of the integrated data set DI.
3. Fulfilment of this condition does not guarantee that only types allowed for use in the
integrated set are used to represent the data of the input set Di since there is a possibility
of type polysemy. We will call polysemous types that have the same designation and differ
in implementation methods. For example, a value of the "date/time" type can be
represented by both numeric and character values, the "text" type in some applications
represents character strings, in others - notes, values of the logical type are represented
as numeric or bit, etc. Discrepancies of this nature are a potential factor in possible errors
in data processing, ambiguous interpretation and obtaining incorrect results. This kind of
inconsistency of data types has, in particular, the following solutions [
          <xref ref-type="bibr" rid="ref1 ref2">1-2</xref>
          ]:
        </p>
        <p>Integrated set TI of data types
Input set of
types Т1
ВInput set of
types Т2</p>
        <p>Input set of</p>
        <p>types Т4
Input set of
types Т8</p>
        <p>Input set of</p>
        <p>types Т3</p>
        <p>Input set of
Input set of types Т5
types Т7</p>
        <p>Input set of
types Т6
 replacement of homonymous data types with new ones, which by definition do not
coincide with others;
 reformatting the values of the input data set to the format of the corresponding types
defined in the integrated data set;
4. If the set of data types Ti, which are applied in the input set Di, exceeds the set of data types
of the integrated set D, i.e. Ti  T0, this indicates the presence in the input set of data
belonging to such types that are not valid data types of the original integrated set.</p>
        <p>Therefore, the data types of the input set are divided into 2 such subsets:
 a subset of types Ti1 = Ti  T0, which are included in the set of types of the integrated data
set;
 a subset of types Ti2= Ti \ T0, which are not included in the set of types of the integrated
data set.</p>
        <p>
          For the subset of Ti1 types, the type-matching procedure is as described above. In the case
when the data of some input set Di belong to types that are not supported in the original
integrated data set D1, the following variants of further transformations are possible [
          <xref ref-type="bibr" rid="ref1 ref2">1-2, 15-18</xref>
          ]:
 expansion of the set of data types of the integrated set by supplementing it with a subset
Ti2 of the data types of the set Di, which is implemented by constructing the next version of the
set of permissible data types of the integrated set T1 = T0  Ti2;
 conversion of data from the format of the types of the set Ti2 to the corresponding types
from the set T0, that is, replacing each data value of the type t(Ti2)  Ti2 with a similar value
represented according to the requirements of the type t(T0)  T0.
5. Another contradiction in the processes of integration of data types is the occurrence of
polymorphism of data types in the integrated set and input sets. We will call polymorphic
data types that differ in form of representation but are identical in interpretation (for
example, REAL and FLOAT, BOOLEAN and LOGICAL types, etc.). In this case, situations may
arise in which data of the same actual type will be incompatible with each other when
performing actions on them, which, in turn, is a potential cause of errors and
contradictions in the data. There are two possible ways to resolve this contradiction [
          <xref ref-type="bibr" rid="ref1 ref2">1-2,
15-18</xref>
          ]:
 bringing polymorphic types to a single method of their determination due to the removal
of such types that repeat others;
 compatible application of all possible options for defining data types due to the creation
of additional means of maintaining the polymorphism of data types and their coordination.
        </p>
        <p>The first way is easier to implement, and the second - expands the possibilities for describing
and manipulating data in an integrated set.</p>
        <p>
          6. The result of the steps described above is a generalized and agreed list of data types that
are used in determining the units of the integrated set TI =IT(T1, T2, …, TN) [
          <xref ref-type="bibr" rid="ref1 ref2">1-2, 15-18</xref>
          ].
        </p>
      </sec>
      <sec id="sec-4-4">
        <title>4.1.3. Integration of syntactic data constraints</title>
        <p>
          The integration of restrictions, which are used when forming data values of some input sets,
involves the formation of such a set of restrictions RI=(r1(RI), r2(RI), …, rm(RI)) that for each
restriction r(Ri)  Ri, applied to some input data set Di (i=1,2, …, N) there is a one-to-one
correspondence given by the mapping ρ:r(Ri) → r(RI). However, unlike alphabets and data types,
the restrictions are not free elements, they are formulated and applied only to specific data types,
categories or values. Therefore, each restriction that is applied to a certain set of data Di is defined
as a condition of the form r(Ri, t(Ti), Dji), which is determined by such factors as belonging to a set
of restrictions Ri, binding to a certain type of data – t(Ti), and the scope is some subset of the data
set Dji  Di. Therefore, the problem of integrating the set of constraints of the input data sets into
a single set of constraints of the integrated set can be solved only after performing the integration
of the alphabet and data types. The ratio of the sets of input data constraints and the integrated
data set is shown in Fig. 8. The input sets of constraints can have partial intersections with each
other, be subsets of each other, be completely independent, be fully or partially part of the
integrated set formed by their integration, or not have an intersection and not be part of it. The
general sequence of the process of integration of input syntactic constraints, the purpose of which
is to create a single, consistent and complete set of constraints applied to values from the
integrated data set, is presented in the form of a scheme of actions [
          <xref ref-type="bibr" rid="ref1 ref2">1-2, 15-18</xref>
          ].
1. Let R0=(r1(R0), r1(R0), …, r1(R0)) be initial set of constraints of some integrated data set D.
2. For each of the sets of restrictions Ri of the input data sets Di (i=1,2,…, N), we check the
condition Ri  R0.
        </p>
        <p>Integrated set of constraints RI</p>
        <p>Input
Constraint</p>
        <p>Set R1</p>
        <p>Input
Constraint</p>
        <p>Set R5</p>
        <p>Input
Constraint</p>
        <p>Set R8</p>
        <p>Input
Constraint</p>
        <p>Set R2</p>
        <p>Input
Constraint</p>
        <p>Set R3</p>
        <p>Input
Constraint</p>
        <p>Set R4</p>
        <p>Input
Constraint</p>
        <p>
          Set R6
3. The fulfilment of this condition means that each of the constraints of the input data set
takes place in the integrated set. But for the final determination of the possibility of
applying restrictions to integrated data, the following factors are additionally checked
[12, 15-18]:
 the presence among the data types of an integrated set of types for which restrictions are
defined, i.e. for each of the restrictions rj(Ri)  Ri there is tj(Ti)  TI, where tj(Ti) is the data type
to which the restriction is applied, TI is multiple types of integrated dataset;
 the presence among the set of values of an integrated data set of values for which
restrictions are defined, that is, for each of the restrictions r(Ri)  Ri, Dji  DI is performed,
where Dji is a subset of values of the data set Di to which the restriction is applied, DI is an
integrated data set;
 the fulfilment of these requirements ensures the possibility of applying the restriction to
the data of the integrated set, and the lack of appropriate data types and/or values makes it
impossible to apply this restriction to the integrated data set.
4. If among the set of constraints Ri of some input set Di some are not constraints of the
integrated data set DI, i.e. Ri  R0, the set of constraints is divided into subsets Ri1 = Ri  R0
and Ri2= Ri \ R0.
5. The set of restrictions Ri1 is consistent with the set R0 and the process of its integration is
performed as described above, but the set of restrictions Ri2 has the following integration
options [
          <xref ref-type="bibr" rid="ref1 ref2">1-2, 15-18</xref>
          ]:
 restrictions from the set Ri2 are applied to values and/or data types that are not part of
the integrated set;
 restrictions from the set Ri2 are applied to the values and data types included in the
integrated set.
        </p>
        <p>
          In the first case, each of the restrictions that does not have an object of application can be
removed without loss from the set of restrictions of the integrated set. In the second, the
procedure for matching additional constraints Ri2 and a set of constraints R0 of the integrated data
set is applied. The reconciliation of these sets of restrictions is achieved due to [
          <xref ref-type="bibr" rid="ref1 ref2">1-2</xref>
          ]:
 extraction of Ri2 constraints from further application in the set of constraints of the
integrated data set DI;
 transformation of restrictions included in the set Ri2 by replacing them with equivalent
ones in content and application from the composition of the set R0 according to the principle
each of the restrictions r(Ri2)  Ri2 is matched with the restriction r1(R0)  R0, which is defined
for types and values of the integrated data set;
 expansion of the set of constraints R0 of the integrated data set by supplementing it with
elements of the set of constraints Ri2 to form a new version R1= R0  Ri2.
6. The result of performing a sequence of actions on the integration of syntactic restrictions
of input data sets is the formation of such a list of data presentation requirements that can
be applied to determine additional properties of data values from the integrated set.
4.1.4. Procedure and requirements for syntactic data integration
        </p>
        <p>
          By performing a sequence of actions on the integration of alphabets, data types and syntactic
constraints according to the scheme described above, a complete and consistent set of elements
of the integrated syntax of GI data is formed, which is used as a method and means of displaying
integrated data obtained as a result of the processes of data extraction, transformation and
loading in DS, as well as with their dynamic integration in operational systems. At the same time,
the problem of detecting and correcting the elimination of contradictions between the local
syntax of the input data sets to be integrated is solved. The general order of syntactic integration
describes such a sequence of steps [
          <xref ref-type="bibr" rid="ref1 ref2">1-2</xref>
          ].
        </p>
        <p>Step 1. Constructing an integrated alphabet as a complete and consistent set of characters to
represent data values in the original integrated set. Performing this step involves the
implementation of the following procedures.</p>
        <p>1. Detection and elimination of contradictions of input alphabets in the process of syntax
integration. This is done according to the following rules.
 Elimination of character polymorphism – the alphabet of the resulting integrated data
set cannot contain different characters with the same interpretation. Such a rule for
matching the representation of the symbols of the alphabets Ai and Aj is described by
an expression of the form</p>
        <p>Alph1(Ai, Aj): αAi,Aj | inti(α) = intj(), (7)
where Alph1 is the rule identifier, α,  are symbols of the input alphabets Ai and Aj; inti(α),
intj() are symbol interpretation functions.</p>
        <p> Elimination of character polysemy - the output alphabet of the resulting integrated
data set cannot contain the same characters with different interpretations. The rule for
matching the interpretation of the symbols of the alphabets Ai and Aj is described by
an expression of the form</p>
        <p>Alph2(Ai, Aj):αAi,Aj | inti(α)  intj(α), (8)
where Alph2 is the rule identifier; α is a symbol included simultaneously in the input alphabets
Ai and Aj; inti(α), intj(α) are functions for interpreting symbols in the alphabets Ai and Aj.
2. Construction of an integrated AI alphabet by combining agreed local input alphabets
according to the rules defined in clause 1:</p>
        <p>AI =IA(A1, A2, …, AN) = A1 A2 … AN | (9)</p>
        <p>Alph1(A1, A2, …, AN)=true ^ Alph2(A1, A2, …, AN) =true,
where IA is the alphabet integration operator; A1, A2, …, AN is a set of input local alphabets;
Alph1(A1, A2, …, AN), Alph2(A1, A2, …, AN) are the rules for matching alphabets defined in paragraph
1. a) and 1. b).</p>
        <p>Step 2. Construction of a single, consistent list of data types that are used in the original
integrated set. The process of integrating data types involves the following actions.
3. Identification and elimination of inconsistencies in data typing methods from input sets.</p>
        <p>The following rules apply to this.
 Elimination of polymorphism of types - the data types used in the original integrated
data set cannot contain different types that have the same interpretation. Such a
matching rule for sets of data types Ti and Tj is described by an expression of the form</p>
        <p>Type1(Ti, Tj):t1Ti,t2Tj |inti(t1)= inti(t2), (10)
where Type1 is the rule identifier; t1, and t2 are data types of input resources, which are
included in the sets of types Ti and Tj, respectively; inti(t1), intj(t1) are interpretations of types t1
and t2, respectively, in sets Ti and Tj.</p>
        <p> Elimination of polysemy of types - in the composition of types used for data typing of
the resulting integrated data set, there cannot be identically defined types with
different interpretations. The rule for matching the interpretation of sets of types Ti
and Tj is described by an expression of the form Type2(Ti, Tj): tTi, Tj | inti(t)  intj(t),
where Type2 is the rule identifier; t is a type included simultaneously in the input local
sets of types Ti and Tj; inti(t), intj(t) are type interpretation functions, respectively, in
the sets Ti and Tj.
4. Construction of a set of types of the original integrated resource TI by harmonizing and
combining local input sets of types according to the rules defined in clause 1:</p>
        <p>TI =IT(T1, T2, …, TN) = T1 T2 … TN | (11)</p>
        <p>Type1(A1, A2, …, AN)=true ^ Type2(A1, A2, …, AN) =true,
where IT is the integration operator of input sets of data types; T1, T2, …, TN are sets of input
local data types; Type1(T1, T2, …, TN), Type2(T1, T2, …, TN) are the rules for matching alphabets
defined in clauses 1. a) and 1. b).</p>
        <p>Step 3. Formation of a single consistent set of syntactic constraints by merging and matching
local constraint sets of input datasets.</p>
        <p> Detection and elimination of contradictions in the syntactic constraint sets of input
datasets. The following rules apply to this.</p>
        <p>a. Elimination of polymorphism of constraints - in the composition of the set of
syntactic constraints, which are applied in the resulting integrated data set,
there cannot be different constraints that have the same interpretation. Such a
rule for matching the sets of constraints Ri and Rj is described by the expression</p>
        <p>Restrict1(Ri, Rj):r1Ri,t2Rj |inti(t1)= inti(t2), (12)
where Restrict1 is the rule identifier; r1, r2 are syntactic restrictions applied to input resources,
which are included in the sets of restrictions Ri and Rj, respectively; inti(r1), intj(r1) are an
interpretation of syntactic constraints r1 and r2 in sets Ri and Rj.</p>
        <p>b. Elimination of polysemy of constraints – in the set of syntactic constraints,
which are applied to the data of the resulting integrated set, there cannot be
identically defined constraints that have different interpretations. The rule for
matching the interpretation of the sets of constraints Ri and Rj is described</p>
        <p>Restrict2(Ri, Rj): rRi,Rj | inti(r)  intj(r), (13)
where Restrict2 is the rule identifier; r is a syntactic restriction that is simultaneously included
in the input local sets of restrictions Ri and Rj; inti(r), intj(r) are functions for interpreting
constraints in the sets Ri and Rj.</p>
        <p> Construction of a set of syntactic restrictions of the original integrated RI resource by
harmonizing and combining local input sets of application types defined in clause 1,
rules:</p>
        <p>RI =IR(R1, R2, …, RN) = R1 R2 … RN | (14)</p>
        <p>Restrict1(R1, R2, …, RN) = true ^ Restrict2(R1, R2, …, RN)=true,
where IR is the syntactic constraint integration operator; R1, R2, …, RN are sets of input local
syntactic constraints; Restrict2(A1, A2, …, AN), Restrict2(A1, A2, …, AN) are syntactic restriction
matching rules defined, respectively, in clauses 1. a) and 1. b).</p>
        <p>Step 4. Constructing an output syntax for representing data in an integrated set. The output
integrated data syntax GI is formed based on the integrated consistent alphabet AI, the integrated
consistent set of data types TI and the integrated output set of syntactic constraints RI</p>
        <p>GI =&lt;AI, TI, RI&gt;. (15)</p>
        <p>The syntax formed in this way provides a correct, consistent and unambiguous representation
of the data values in the data set, which is created as a result of their integration.</p>
      </sec>
      <sec id="sec-4-5">
        <title>4.2. Structural data integration 4.2.1. General principles of integration of data structures</title>
        <p>
          The problems of creating and maintaining heterogeneous structures of integrated information
resources, in general, go beyond the functional capabilities of traditional data storage
environments implemented by DB servers and DBMS [15-19]. Today, many other abstractions
and management methods are known, which either confirm their suitability or are removed from
the management environment of integrated heterogeneous content [
          <xref ref-type="bibr" rid="ref1 ref2">1-2</xref>
          ]. A comprehensive
solution to the problems of management and application of integrated content, which includes
both structured (relational data) and loosely structured data, provides tools that implement
technologies for the joint processing of such resources as structured data, text, spatial, temporal,
visual, multimedia data, procedural data, triggers, streams and data queues, imprecise and fuzzy
data [20-23]. The heterogeneity of input data to be integrated extends to the diversity of their
structures (Fig. 9). Modern IS uses data of various levels and forms of structuring. Along with the
structured data stored in the DB, the information resources of open IS contain so-called
nonrelational data, in particular, weakly structured (semi-structured) data, data without a prior
description of the structure (self-structured), stream data, procedural data, etc. [23]. Building a
single agreed description of the structure of disparate data is one of the tasks performed in the
process of their integration. A general description of the structure of the integrated data set is
given as [
          <xref ref-type="bibr" rid="ref1 ref2">1-2, 15-18</xref>
          ]:
        </p>
        <p>CI=&lt;R, NR1, NR2, …, NRk , JR, JN, JRN&gt;, (16)
where СI is a description of the structure of the integrated information resource; R is
description of the structure of the relational component, which is formed by structured data
presented in the form of database tables; NR1, NR2, …, NRk are description of non-relational
components of various types; JR is a set of connections between relational elements; JN is a set of
connections between non-relational elements; JRN is a set of relations between relational and
nonrelational elements.</p>
        <p>R1
R</p>
        <p>R2
..
.</p>
        <p>Rn
Relational component</p>
        <p>NR2</p>
        <p>NRm</p>
        <p>NR1
...</p>
        <p>NR</p>
        <p>
          The integration of data structures (structural integration) of data is defined as the process of
a coordinated combination of structured (relational) data stored in databases and non-relational
data stored in formats other than DB. The relational component in this sense is the central
element of integration since modern DBs and DBMSs [15-18] provide a sufficiently wide range of
opportunities for joint and coordinated processing of not only structured data, but also
information resources specified by other data methods [
          <xref ref-type="bibr" rid="ref1 ref2">1-2</xref>
          ].
        </p>
      </sec>
      <sec id="sec-4-6">
        <title>4.2.2. Models of structural integration</title>
        <p>
          The main problems of DB integration with other types of data and directions and principles of
their solution are defined in [21-23]. Typical approaches to the integration of structured
relational, weakly structured/self-structured data are described by models [
          <xref ref-type="bibr" rid="ref1 ref2">1-2, 15-18</xref>
          ].
1. Integration of structured data with loosely structured (documentary, textual, spatial,
temporal, visual and multimedia) data. Modern database management systems largely
provide solutions to such tasks through the use of special data types (temporal types of
symbolic and binary objects, generated types, etc.) and the "XML document" data type (Fig.
10). Values of these types are integrated into tables and supplement the list of elementary
values in descriptions of entities and facts. The structure of relational database tables, in
which loosely structured data is stored together with relational data, is described as [
          <xref ref-type="bibr" rid="ref1 ref2">1-2,
15-18</xref>
          ]: R(A1, A2, …, Ak, X1, X2, …, Xm), where A1, A2, …, Ak are table columns that represent
scalar values of traditional and special types; X1, X2, …, Xm are columns that depict weakly
structured values [15-18].
2. Integration of DB and procedural data. Such a model involves the integration of actual data
stored in databases and a set of object data types together with methods that encapsulate
them. In this case, each column of the table (Fig. 11a) can be represented by a pair of the
form (A, M), where A is a column of the table, and M is a set of methods associated with this
column. The structure of such a table is described by an expression of the form [
          <xref ref-type="bibr" rid="ref1 ref2">1-2,
1518</xref>
          ]: R((A1, M1),(A2, M2),…,(Ak, Mk)).
        </p>
        <p>a1 a2 ... an</p>
        <p>XML1
...</p>
        <p>XMLm
3. Integration into databases of triggers and data processing procedures. The use of such
elements ensures the implementation of the concept of an active database. Such a DB,
together with the values, stores a description of certain rules and actions that are
performed when the state of the database changes. Tables, which include traditional data
and active elements, are described by a structure model of the following type R(A1, A2, …,
Ak, T1, T2, …, Tm), where A1, A2, …, Ak are table columns that represent ordinary typed values,
T1, T2, …, Tm are a set of triggers that describe the actions associated with changing the
state of the table (Fig. 11b).</p>
        <p>M1</p>
        <p>M1
...</p>
        <p>Mn
A1</p>
        <p>A2
...</p>
        <p>An</p>
        <p>A1</p>
        <p>A2
...</p>
        <p>An</p>
        <p>T1
T1
...</p>
        <p>
          Tn
4. Integration of static data with streams and data queues. Stream data is a set of values that
is not stored on the system media, but exists only at the time of application of this data. An
example of streaming data is monitoring, stock exchange information, broadcast news in
a standardized format, etc. The data flow is formed as a result of the execution of requests,
forwarding or selection of data. A queue is a special type of data stream in which each unit
is given an ordinal character. The structure of the data stream S at a certain time t is
described by an expression of the form [
          <xref ref-type="bibr" rid="ref1 ref2">1-2, 15-18</xref>
          ]:
        </p>
        <p>
          St=&lt;S(Rt), S(Xt), S(St`), S(Wt)&gt;, (17)
where S(Rt) is the structure of a set of values obtained as a result of selection operations from
relational database tables; S(Xt) is the structure of a set of data obtained as a result of selection
from sets of weakly structured data; S(St`) is the structure of a set of data obtained as a result of
selection from other data streams; S(Wt) is the structure of a set of data obtained from web
resources. The result of the integration of static and flow components is a semi-dynamic structure
(Fig. 12), which combines data stored in databases and data formed in the form of a time-varying
flow [
          <xref ref-type="bibr" rid="ref1 ref2">1-2</xref>
          ].
        </p>
        <p>The structure of the integrated set, which combines relational and stream data CI, is described
as CI=&lt;Rt, St, JRt, JSt, JRSt&gt;, where Rt is the structure of the relational component determined at time
t; St is the structure of the flow component determined at time t; JRt is the scheme of connections
between the elements of the relational component at the moment of time t; JSt is a diagram of
connections between elements of the flow component at time t; JRSt is the scheme of connections
between the elements of the relational and flow component at the moment of time t [15-18].
R</p>
        <p>R2
...</p>
        <p>Rn</p>
        <p>m</p>
        <p>S
1. Integration of information resources in DS structures and data spaces (Fig. 13) [15-18].</p>
        <p>The significant volumes and variety of data used in the activities of large businesses and
other formations require the use of special approaches to their organization and
processing, which ensure their availability and efficiency. Today, DS and data space
technologies are such ways. These information resources provide the ability to perceive
and apply a large number of values obtained from tens or hundreds of operational DBs,
due to their aggregation and fusion. Modern DSs support two methods of integration:
asynchronous - "extraction-transformation-loading" (ETL) and operational (OnLine)
integration of sets and streams and other units of data received from different sources.
The first provides for the formation of a stable and time-invariant DS for long-term,
repeated use, and the second - is the formation of an integrated resource that reflects the
state of the data at a certain point in time and provides one-time access to the IR of the
corporate system. DS, as a means of integrated presentation and processing of data,
provides for the formation of some global data structure, which reflects the structures of
local input resources. Depending on the method of integration, such a scheme can be static
(in the case of using "extraction-transformation-loading" procedures) or created
dynamically (for DS based on On-Line integration) [15-18].</p>
        <p>Users</p>
        <p>
          In both cases, the global CG structure is presented as a union of the mappings of n local
structures onto the global one, built according to a predefined procedure [
          <xref ref-type="bibr" rid="ref1 ref2">1-2, 15-18</xref>
          ]:
        </p>
        <p>CG=Str1(CG) Str2(CG) … Strn(CG), (18)
where CG is the global structure of DS; Stri(CG) is the mapping of the local structure of the ith
input resource to the global structure (i=1,2, ..., n). The structure formed in this way is the result
of the integration of a set of input resources and provides the possibility of their joint application,
management and access.</p>
        <p>
          2. Integration of sensor data and sensor networks. This relatively new direction in data
integration makes it possible to realize the possibilities and advantages of cloud (network)
computing [15-18]. A sensor network is a network of distributed devices, each of which is
a source of certain data [23]. Examples of such networks are a monitoring sensors
network, a network of points of goods or services sale, terminals for customer service,
meteorological or geo-informational distributed measuring complexes, etc. From the point
of view of integration, the sensor network is a distributed database that can function both
independently and is compatible with other databases as part of integrated data
processing systems (Fig. 14) [
          <xref ref-type="bibr" rid="ref1 ref2">1-2, 15-18</xref>
          ].
        </p>
        <p>R1
R</p>
        <p>R2
...</p>
        <p>Rn</p>
        <p>
          A network of sensor data can include both structured objects and loosely structured data sets,
or data sets without a predefined structure. In addition, sensor network data can be both static
and dynamic, that is, it can be presented both in the form of sets and streams of data. In this case,
the sensor network is divided into two parts - static and dynamic, between which connections
and the order of interaction are defined [15-18]. The general structure of the data obtained as a
result of the integration of relational and sensor network data: CI=&lt;R, SN, SNt, JR, JSN, JSNt, JRSNt&gt;,
where R is the structure of the relational component; SN is the structure of the static component
of the sensor network; SNt is the structure of the dynamic component of the sensor network at
time t; JR is the scheme of connections between elements of the relational component; JSN is a
scheme of static connections between sensor network elements; JSNt is the scheme of dynamic
connections between elements of the sensor network at time t; JRSNt is a scheme of connections
between the elements of the relational and static and dynamic components of the sensor network
at the moment of time t [15-18]. The integrated data structure formed in this way provides access
and management of an integrated information resource, which includes relational structured
data and sensor network data [
          <xref ref-type="bibr" rid="ref1 ref2">1-2</xref>
          ].
        </p>
      </sec>
      <sec id="sec-4-7">
        <title>4.2.3. The general order of structural integration</title>
        <p>
          The models of structural integration, which are described above, reflect the peculiarities of the
formation, processing and application of heterogeneous content of open information systems of
various directions. This approach allows to formalize and generalize methods of structuring and
maintaining access to integrated information resources of open systems regardless of their
content, purpose and means of implementation. The general order of integration of
heterogeneous data structures is based on the following scheme (Fig. 15) [
          <xref ref-type="bibr" rid="ref1 ref2">1-2, 15-18</xref>
          ].
        </p>
        <p>Step 1. Formation of a single agreed description of structured (relational) data. The structure
of the relational component is part of the original structure and describes a set of tables/images
from the DB of the input local resources, which form a subset of the data participating in the
formation of the original integrated resource. It is described by the expression of the species</p>
        <p>RI=(DB1.(R1, R2, …, Rk1) , DB2.(R1, R2, …, Rk2), …, DBN.(R1, R2, …, RkN ), JIR), (19)
where RI is the scheme of the relational component of the original integrated data set; DB1,
DB2, …, DBN are databases of local input resources; N is the number of local input resources;
DBi.(R1, R2, …, Rki) is relational data structure in the database of the ith input structured resource
(i=1,2, …, N), k is the number of tables or images of the input resource; JIR is a diagram of
relationships between relational components in an integrated source dataset.
1
S</p>
        <p>S2 .</p>
        <p>.
.</p>
        <p>m</p>
        <p>S
Streaming</p>
        <p>data</p>
        <p>Procedural data
M1</p>
        <p>M1
...</p>
        <p>Mn</p>
        <p>R
a1</p>
        <p>S2</p>
        <p>Sm</p>
        <p>S
Weakly structured</p>
        <p>component
a2 ...</p>
        <p>XML is a
component
an</p>
        <p>XML1 ... XMLm</p>
        <p>Step 2. Formation of an agreed description of non-relational resources. Performing this step
involves the following steps.</p>
        <p>1. Specification of composition, properties of units of weakly structured resources and
connections between them. As a result, they form a general description of resources in
XML, HTML, formatted documents, etc. formats.
2. Specification of the composition and properties of resource units without a predetermined
structure and the relationships between them. The result is a list and description of the
structural features of text, graphic, and multimedia data sets.
3. Formation of an agreed description of procedural and active data and their connections.
4. Building a general description of the stream data structure.
5. Determination of relationships between various components of the non-relational
component of the original integrated data set.</p>
        <p>Step 3. Formation of a general description of the elements of the relational and non-relational
components of the original integrated data set.</p>
        <p>Step 4. Determination of the scheme of mutual relations between the elements of the
relational/non-relational component of the original integrated data set.</p>
        <p>
          Step 5. Construction of a description of a single global CI structure as a description of the
composition and properties of the original integrated data set, formed by integrating the
structures of input local data sets of different natures [
          <xref ref-type="bibr" rid="ref1 ref2">1-2, 15-18</xref>
          ]: CI=&lt;RI, XI, NI, MI, SI, JI&gt;, where
CI is a description of the structure of integrated relational resources; XI is the description of
integrated weakly structured resources; NI is the description of integrated resources without
prior description of the structure; MI is the description of integrated active resources; SI is
description of integrated streaming resources; JI is a description of the relationships between the
elements of the integrated source data set. In this way, a single global output structure is formed,
which combines the description of structured (relational) data stored in databases with the
description of non-relational (weakly structured, self-structured, procedural, streaming and
other) data. A formalized description of the global structure is a tool based on access,
management and processing of a common integrated information resource [
          <xref ref-type="bibr" rid="ref1 ref2">1-2, 15-18</xref>
          ].
4.3. Development of formal means of specification of integrated data
The purpose of this section is to develop the general architecture, principles and order of
functioning of data integration tools in open information systems based on a service-oriented
approach. The main tasks arising from the goal are the following [
          <xref ref-type="bibr" rid="ref1 ref2">1-2</xref>
          ]:
 specification of general provisions and principles of service-oriented architecture of data
integration tools: determination of the principles of data integration as a service of open
information systems, formulation of basic concepts and definitions of the data integration
service, development of the general architecture of the data integration service;
 development of the draft specification of the data integration service protocol:
a. description of the protocol status of the data integration service,
b. development of the protocol structure of the data integration service,
c. development of language tools for describing the syntax of integrated data,
d. development of linguistic means of describing the structure of integrated data,
e. development of language tools for describing the semantics of integrated data,
f. development of language means of describing additional properties of
integrated data,
g. development and testing of data integration service tools.
        </p>
        <p>
          The result of performing the assigned tasks is the specification of protocol and language tools
for creating metadata that describe the syntax, semantics, and structure of data in integration
processes, as an interoperable service of an intermediate level [
          <xref ref-type="bibr" rid="ref1 ref2">1-2</xref>
          ]. The fundamental difference
between data integration tools and others is that they are implemented at the protocol level, not
at the tool level, as is customary in many integration platforms. The chapter is written based on
scientific publications and studies [
          <xref ref-type="bibr" rid="ref1 ref2">1-2, 24-31</xref>
          ].
        </p>
      </sec>
      <sec id="sec-4-8">
        <title>4.4. Service-oriented data integration architecture 4.4.1. Data integration as a service of open intelligent systems</title>
        <p>
          Some of today's popular approaches, principles and concepts [
          <xref ref-type="bibr" rid="ref1 ref2">1-2, 32-45</xref>
          ] are the basis for the
construction of tools that implement the method of multi-level data integration.
1. The concept of Information-on-Demand. This concept is based on the data understanding
as a certain asset of the business structure, which is a product of its activity and a subject
of consumption by the client. The essence of this approach boils down to the following. In
the age of advanced information and communication technologies, the consumer of an
information product does not need to take part in its formation, storage and maintenance.
As the need arises, the user turns to the relevant business structures, which form a set of
information products or services by the user's request. In this way, the information asset
acquires consumer qualities similar to the consumer qualities of material resources
produced by the manufacturer. The implementation of such a concept involves the
application of appropriate business architectural and technological solutions. One of the
constituent parts of the "information on demand" dissemination process is the obtaining,
matching, transforming and combining data process that forms the user requests
execution result. The general task of such processes can be reduced to the integrating data
task obtained from various sources and submitted according to user requirements.
2. Data-as-a-service (DaaS). The principle of presenting data as some service that the
information system provides to the user is the extension of the service-oriented approach
to various components of open systems. The essence of this principle lies in the possibility
for the user to receive a set of data of the appropriate composition and content upon a
request formulated according to predefined rules. This approach is justified in cases where
creating your data repository is impossible, unavailable or unprofitable for the user. The
factors that stimulate the development of data services are, first of all, the additional costs
borne by the data consumer during the independent formation of IR, developed network
infrastructure and a significant selection of public resources available within the limits of
standardized technologies. This allows you to minimize the costs of technical means of
data storage and transmission, costs of software and technological means of processing
data repositories, costs of replenishment, updating and updating of data, etc. Obtaining
data as a service provided by open resources for public use allows the user to significantly
reduce his participation in solving the problems of information support of his activity. An
important component of the complex problems that are solved within the framework of
the implementation of the data service is the problems associated with obtaining data from
disparate sources and combining them in the final product that the consumer receives.
3. Content Management Interoperable Service (CMIS). The CMIS concept provides for the
possibility of free user access to content located in various data repositories. For this, a
special service interface is used, which is independent of the platform and which is
configured according to the needs of a specific user. The data model and services of CMIS
are independent of data access protocols and allow to implementation of the concept of a
virtual information environment that reflects the user's views on SA. CMIS uses separate
principles of data integration to form the final result of data search in different
repositories, their selection and the combination of data obtained from disparate sources.
However, using the CMIS approach only provides the user with access to the data and
receives it on an "as is" (as is) basis and does not provide procedures for transforming the
selection results to create a coherent, consistent data set. Therefore, CMIS services are, to
a greater extent, a means of data virtualization than their integration.
4. Information resource description technology - Resource Description Framework (RDF),
developed by the W3 consortium. The RDF toolkit is one of the promising tools for solving
problems related to the description of semantics and pragmatics of data. This technology
is based on a special RDF data model, which represents a set of facts and semantic
relationships between them, presented in the RDF document format. The notation used to
build RDF documents is based on XML principles, which makes them interoperable
concerning platforms and environments and available for processing using various
technologies. The concept of FDF assumes that a data model is a summary of information
and knowledge about a certain information resource, and an RDF document, which is built
according to a defined notation, is a means of presenting this information. The basic block
of the RDF data model is a statement, which defines the subject, predicate, and object. A
subject is a specific information resource, a predicate is a named property of this resource,
and an object is the value of this property.
5. The XMI (XML Metadata Interchange) metadata exchange standard is a normative
document, the specification of which was approved by the W3 consortium in 1999.
According to its concept, the XML format is used for the formation, movement and
exchange of metadata in IS environments. This makes it possible to apply a single, unified
way of presenting metadata for various environments and applications, such as DB, DS,
web resources, UML metamodels, data repositories, etc., which is important for
implementation in open systems environments.
        </p>
        <p>
          The main principles of XMI application are [
          <xref ref-type="bibr" rid="ref1 ref2">1-2, 46-51</xref>
          ]:
 metadata, regardless of nature, content and subject, are submitted in XML format, for
which XMI specification defines the procedure for typing the corresponding XML documents;
 the standard defines methods of displaying metadata generated according to other
standards in XML format, which enables the generation of correct descriptions created based
on meta-models of various formats;
 the XMI standard provides for the possibility of exchanging metadata both in the form of
data streams and the form of standardized format files (XML documents);
 IT, which is XMI standard basis, is supported by leading manufacturers of application and
data management tools, including Oracle, IBM, Rational, and Sybase.
        </p>
        <p>All the approaches described above, to a certain extent, have an impact on data integration
processes. Therefore, in the integration service concept formulation, the features of each of them
were used [52-63].</p>
      </sec>
      <sec id="sec-4-9">
        <title>4.4.2. Architecture of the data integration service</title>
        <p>
          The data integration service implements a common unified data access model through a
special layer of standardized reusable data abstractions [
          <xref ref-type="bibr" rid="ref1 ref2">1-2</xref>
          ]. This, in turn, provides
opportunities to obtain complete, reliable and timely generalized information without the use of
complex mechanisms and additional resources or financial costs. The general architecture of
service-oriented data integration (Fig. 16) combines heterogeneous information resources
through data integration services with applications and different categories of users [
          <xref ref-type="bibr" rid="ref1 ref2">1-2</xref>
          ]. At the
same time, it is taken into account that the general information resource contains data sets
formed by various methods and technologies. To create the possibility of sharing the values
concentrated in such sets as part of the information system, an additional level is created - the
level of data integration services. At this level, a unified description of the syntax, structure and
semantics of each set in the form of metadata, as well as means of manipulating them, is
concentrated. Users and applications refer not directly to the data, but to the metadata that
describes it. The result of metadata processing and interpretation is a generalized query,
composed, in turn, of several detailed queries that formulate the order of access to the data sets
located in the relevant repositories, and the order of creation of the resulting set of values that
contain the results of the query. In this way, the use of data by the end-user of the information
system becomes independent of the form and order of their presentation in the storage
environment.
        </p>
        <p>... Managers</p>
        <p>Consumers ...</p>
        <p>USERS</p>
        <p>Analysts
external level
APPLICATION
Electronic
business
...
Businessanalytics</p>
        <p>Monitoring</p>
        <p>Corporative
management
applied level</p>
        <p>Social ...</p>
        <p>resources</p>
        <p>
          The functioning of data integration services as part of the information system is built
according to the following scheme (Fig. 17) [
          <xref ref-type="bibr" rid="ref1 ref2">1-2</xref>
          ]. The basis of the data integration service is the
data integration server, which is an intermediate link between application tools and data
repositories. Such a server is created according to the principles of intercomponent support
(middleware) as part of a set of active server units and means of their maintenance and a global
meta schema of the information resource. Requests for the required set of values are sent from
user applications to the data integration server in the environment-specific format. The request
received by the server goes through the stages of analysis and interpretation, the task of which is
to generate an integrated meta schema of the data set of the request results. An integrated meta
schema combines a unified description of all data units that should be part of the resulting set
and is an input to the data repository server. The repository server, based on the meta schema
received at the entrance, performs access to the necessary components of the data storage
environment based on the corresponding services, the selection of specified values and data
units, and the formation of a single set of integrated data. The results of the actions of the
repository server are transferred to the integration server, combined with their meta schema and
transferred to the application from which the request came. The basis of such a service is a set of
specialized tools that make up the data integration service protocol. The advantages of using
protocol means of data integration over instrumental ones arise, first of all, due to such
characteristic properties [
          <xref ref-type="bibr" rid="ref1 ref2">1-2</xref>
          ].
        </p>
        <p>request
metaschema</p>
        <p>results
application</p>
        <sec id="sec-4-9-1">
          <title>Data integration server</title>
          <p>integration services
...</p>
        </sec>
        <sec id="sec-4-9-2">
          <title>Repository server</title>
          <p>Data repositories
results
Access
service N
Repository 1</p>
          <p>Repository 2</p>
          <p>Repository N
1. Unification - the use of protocol-level tools makes it possible to reduce the processes of
data integration to the manipulation of typical concepts, objects, properties and
procedures for their processing.
2. Interoperability - provides the possibility of joint use of means of maintaining the
application protocol of data integration with any other means of data processing.
3. Mobility - the implementation of protocol means of data integration at the application level
makes them independent of the specifics of the platforms and the implementation
environment, which ensures the possibility of their free movement.
4. Processing formats and procedures standardization - XML use as the basis of the language
means of describing objects and processing metadata makes it possible to process the
necessary resources by standard means and according to standardized procedures.
5. Compliance with the principles of SOA - provides easy access and use of data integration
protocol tools for a wide range of information system users.
6. Implementation ease – such means use of integration does not require restructuring
business processes, platforms or other means of maintaining the open IS environment, as
it involves the additional, relatively autonomous intermediate layer formation of data
abstraction.
7. Insignificant cost - the absence of the need to rebuild information systems, and develop,
purchase and implement complex and expensive data integration tools and platforms
makes projects based on protocol tools relatively inexpensive.</p>
        </sec>
      </sec>
      <sec id="sec-4-10">
        <title>4.5. Data integration service protocol</title>
        <p>
          The Data Integration Service Protocol (DISP) is an application-level protocol that defines the
methods, order and means of implementing the data integration service at the request of the user
of the open information system. The main purpose of using the protocol is the organization of the
service for the user of the open information system, which forms the data set he needs by
integrating the data located in a set of disparate local information resources. The protocol does
not depend on protocols of lower (presentation and session) levels and provides for user
interaction with open information system services using standardized interfaces. The goals
achieved by the protocol are as follows [
          <xref ref-type="bibr" rid="ref1 ref2">1-2</xref>
          ]:
 creation of a platform for building a data integration service in environments of open
information systems with heterogeneous distributed information resources;
 maintenance of interaction with standard application tools of the user of the open
information system;
 perception and interpretation of requests of the user of the open information system to
receive the appropriate information product or service;
 ensuring user access to disparate local information resources set through integration;
 formation of a global data set based on local information resources through integration;
 transfer of the results of the request to the user of the open information system.
The goals of the protocol are not:
 determination of ways, methods and means and formation of local information resources;
 determination of the composition and characteristics of technological means of data
processing in local repositories;
 administration of repositories in which local information resources are concentrated;
 implementation of data protection and security measures in the IS environment.
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions</title>
      <p>Based on the performed analysis and generalizations, the main methods of building
information resources of business analytics systems are determined - homogenization,
distribution, and integration. The approach based on data integration is the most appropriate to
the features of business intelligence systems. Together with the other resources integration of
business intelligence systems, data integration forms a single coordinated set of actions for the
design, construction and support of business intelligence systems.</p>
      <p>Based on the conducted analysis, it is possible to conclude that there are several relevant
today's interrelated problems in the e-commerce field.</p>
      <p>1. The problem of forming a high-quality information resource of business analytics systems
that is relevant to the system goals, adequate to its tasks and appropriate to the needs and
requirements of users.
2. The problem of heterogeneous form, content and properties integration of input
information resources into a single agreed common resource of business analytics
systems.
3. The problem of developing and substantiating a single generalized theoretical concept of
data integration for the unified effective methods, tools and modern technologies creation
to solve the problem of forming information resources of business analytics systems in
various industries and fields of application.</p>
      <p>The development of methods of multi-level data integration in business analytics systems
allows us to draw the following conclusions:
 It is advisable to base the method on a multi-level data model that takes into account
various aspects of this data (syntax, semantics, meaning);
 The key point of the integration process is its implementation based on data
metaschemas, which made it possible to reduce access to the actual data and reduce the time
required for integration;
 The general process of integration is carried out as a set of sub-processes of integration
of values, syntax and semantics of data;
 The use of ontologies in the processes of semantic data integration made it possible to
carry out integration at the level of data content, to achieve the same interpretation of data by
both people and machines.</p>
      <p>In the process of developing knowledge presentation methods for intelligent business
analytics systems, the algebraic theory of types was laid as its basis. An algebraic system has been
built in which there is an unambiguous mapping between ontology entities and abstract data
types. The development of presentation methods and the architecture of intelligent networks of
business processes allows us to draw the following conclusions:
 IS is based on the ontology of business processes.
 Given the changing nature of SA and the incomplete knowledge of the ontology developer,
it is necessary to use ontologies that can adapt to changes.
 Intelligent networks of business processes should be presented as a set of interacting
executable ontological models.</p>
      <p>As a result of the work, a specification of language tools was developed for creating metadata
that describes the syntax, semantics, and structure of data in integration processes, as an
interoperable service of an intermediate level. The advantages of linguistic means of data
integration over instrumental ones are due to the following factors.</p>
      <p>1. Unification - the use of protocol-level language tools makes it possible to reduce the
processes of data integration to the manipulation of typical concepts, objects, properties
and procedures for their processing.
2. Interoperability – provides the possibility of joint use of means of maintaining the
application protocol of data integration with any other means of data processing.
3. Mobility – the implementation of protocol means of data integration at the application
level makes them independent of the specifics of the platforms and the implementation
environment, which ensures the possibility of their free movement.
4. Standardization of processing formats and procedures – the use of XML as the basis of the
language means of describing objects and processing metadata makes it possible to process
the necessary resources by standard means and according to standardized procedures.
5. Compliance with the principles of SOA – provides easy access and use of data integration
protocol tools for a wide range of information system users.
6. Ease of implementation – such means use of integration does not require restructuring of
BP, platforms or other means of maintaining the open information system environment, as it
involves the formation of an additional, relatively autonomous intermediate layer of data
abstraction.
7. Insignificant cost – the need absence to rebuild information systems, and develop,
purchase and implement complex and expensive data integration tools and platforms makes
projects based on protocol tools relatively inexpensive.
[8] F. K. Bruder, R. Hagen, T. Rölle, M. S. Weiser, T. Fäcke, From the surface to volume: concepts
for the next generation of optical–holographic data‐storage materials, Angewandte Chemie
International Edition 50(20) (2011) 4552-4573.
[9] M. L. Brodie, Data integration at scale: From relational data integration to information
ecosystems, in: 24th IEEE International Conference on Advanced Information Networking
and Applications, 2010, April, pp. 2-3.
[10] M. Stonebraker, U. Çetintemel, S. Zdonik, The 8 requirements of real-time stream processing,</p>
      <p>ACM Sigmod Record 34(4) (2005) 42-47.
[11] IBM Rational Unified Process, Rational Process Library, URL:
https://www.ibm.com/support/pages/rational-unified-process-rup-plug-ins-rationalmethod-composer-751.
[12] IEEE Guide for Developing System Requirements Specifications: IEEE Std 1233a-1998.</p>
      <p>Institute of Electrical and Electronics Engineers, Inc., 1998.
[13] IEEE Standard Glossary of Software Engineering Terminology: IEEE STD 610.12-1990.</p>
      <p>Institute of Electrical and Electronics Engineers, Inc., 1990.
[14] P. Zdebskyi, A. Berko, V. Vysotska, Investigation of Transitivity Relation in Natural Language</p>
      <p>Inference, CEUR Workshop Proceedings 3396 (2023) 334-345.
[15] P., Zdebskyi, et al., Intelligent System for Semantically Similar Sentences Identification and
Generation Based on Machine Learning Methods, CEUR Workshop Proceedings 2604 (2020)
317-346.
[16] V. Lytvyn, S. Kubinska, A. Berko, T. Shestakevych, L. Demkiv, Y. Shcherbyna, Peculiarities of
Generation of Semantics of Natural Language Speech by Helping Unlimited and
ContextDependent Grammar, CEUR Workshop Proceedings 2604 (2020) 536-551.
[17] V. Lytvyn, et al., Methods and Models of Intellectual Processing of Texts for Building
Ontologies of Software for Medical Terms Identification in Content Classification, CEUR
Workshop Proceedings 2488 (2019). 354-368.
[18] M. Fedorov, et al., Decision Support System for Formation and Implementing Orders Based
on Cross Programming and Cloud Computing, CEUR Workshop Proceedings 2917 (2021)
714-748.
[19] E. E. Schadt, M. D. Linderman, J. Sorenson, L. Lee, G. P. Nolan, Computational solutions to
large-scale data management and analysis, Nature reviews genetics 11(9) (2010) 647-657.
[20] J. Hamilton, A Conversation with Pat Selinger: Leading the way to manage the world’s
information, Queue 3(3) (2005) 18-28.
[21] P. Bernstein, et al., The asilomar report on database research, ACM Sigmod record 27(4)
(1998) 74-80.
[22] R. Agrawal, et al., The claremont report on database research, ACM Sigmod Record 37(3)
(2008) 9-19.
[23] S. Abiteboul, et al., The Lowell database research self-assessment, Communications of the</p>
      <p>ACM 48(5) (2005) 111-118.
[24] Y. Burov, V. Vysotska, V. Lytvyn, L. Chyrun, Software Based on Ontological Tasks Models, in;
International Scientific Conference on Intellectual Systems of Decision Making and Problem
of Computational Intelligence, 2022, May, pp. 608-638. Cham: Springer Int. Publishing.
[25] Y. Burov, Knowledge Based Situation Awareness Process Based on Ontologies, CEUR</p>
      <p>Workshop Proceedings 2870 (2021) 413-423.
[26] I. Pelekh, A. Berko, V. Andrunyk, L. Chyrun, I. Dyyak, Design of a system for dynamic
integration of weakly structured data based on mash-up technology, in: IEEE Third
International Conference on Data Stream Mining &amp; Processing, 2020, August, pp. 420-425.
[27] A. Berko, I. Pelekh, L. Chyrun, I. Dyyak, Information resources analysis system of dynamic
integration semi-structured data in a web environment, in: IEEE Third International
Conference on Data Stream Mining &amp; Processing (DSMP), 2020, August, pp. 414-419.
[28] A. Berko, Y. Matseliukh, Y. Ivaniv, L. Chyrun, V. Schuchmann, The text classification based on
Big Data analysis for keyword definition using stemming, in: IEEE 16th International
Conference on Computer Sciences and Information Technologies (CSIT), Vol. 1, 2021,
September, pp. 184-188.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Berko</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Pelekh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Chyrun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bublyk</surname>
          </string-name>
          , I. Bobyk,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Matseliukh</surname>
          </string-name>
          , L. Chyrun,
          <article-title>Application of ontologies and meta-models for dynamic integration of weakly structured data</article-title>
          ,
          <source>in: IEEE 3rd International Conference on Data Stream Mining &amp; Processing</source>
          ,
          <year>2020</year>
          , August, pp.
          <fpage>432</fpage>
          -
          <lpage>437</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>V.</given-names>
            <surname>Vysotska</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Holoshchuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Holoshchuk</surname>
          </string-name>
          ,
          <source>A Comparative Analysis for English and Ukrainian Texts Processing Based on Semantics and Syntax Approach, CEUR Workshop Proceedings</source>
          <volume>2870</volume>
          (
          <year>2021</year>
          )
          <fpage>311</fpage>
          -
          <lpage>356</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>I.</given-names>
            <surname>Kushniretska</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Kushniretska</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Berko</surname>
          </string-name>
          ,
          <article-title>Designing of structural ontological data systems model for mash-up integration process</article-title>
          ,
          <source>Applied Computer Science</source>
          <volume>11</volume>
          (
          <issue>1</issue>
          ) (
          <year>2015</year>
          )
          <fpage>39</fpage>
          -
          <lpage>50</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Berko</surname>
          </string-name>
          , et al.,
          <source>Models and Methods for E-Commerce Systems Designing in the Global Economy Development Conditions Based on Mealy and Moore Machines, CEUR Workshop Proceedings</source>
          <volume>2870</volume>
          (
          <year>2021</year>
          )
          <fpage>1574</fpage>
          -
          <lpage>1593</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>X. L.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Naumann</surname>
          </string-name>
          ,
          <article-title>Data fusion: resolving data conflicts for integration</article-title>
          ,
          <source>Proceedings of the VLDB Endowment</source>
          <volume>2</volume>
          (
          <issue>2</issue>
          ) (
          <year>2009</year>
          )
          <fpage>1654</fpage>
          -
          <lpage>1655</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>X. L.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Berti-Equille</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Srivastava</surname>
          </string-name>
          ,
          <article-title>Integrating conflicting data: the role of source dependence</article-title>
          ,
          <source>Proceedings of the VLDB Endowment</source>
          <volume>2</volume>
          (
          <issue>1</issue>
          ) (
          <year>2009</year>
          )
          <fpage>550</fpage>
          -
          <lpage>561</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J. P.</given-names>
            <surname>Dittrich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A. V.</given-names>
            <surname>Salles</surname>
          </string-name>
          ,
          <article-title>iDM: A unified and versatile data model for personal dataspace management</article-title>
          ,
          <source>in: Proceedings of the 32nd international conference on Very large data bases</source>
          ,
          <year>2006</year>
          , September, pp.
          <fpage>367</fpage>
          -
          <lpage>378</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>