<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Optimization of the Corporate Information Hierarchy Process Database System Structure of a Distributed Node Using the Analytic</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mykhailo Dvoretskyi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Svitlana Dvoretska</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hlib Horban</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yuriy Nezdoliy</string-name>
          <email>nezdoliy.yura@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Petro Mohyla Black Sea National University</institution>
          ,
          <addr-line>68-Desantnykiv St 10, Mykolaiv, 54003</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <fpage>193</fpage>
      <lpage>203</lpage>
      <abstract>
        <p>The relevance of the problem of optimizing the database structure of a node in corporate information systems (CIS) is due to the widespread use of information technologies of multilevel, geographically dispersed computer systems, including those with distributed databases. One of the research aims is to determine and build a mathematical model of the optimality criteria for the structure of a remote node of the distributed corporate information system database. The statistics of user SQL-queries activity is taken into account and presented in the form of a multidimensional database. Criteria of the model effectiveness are formulated, which are independence from the central node of the database, the size of the local database, and an indicator of the level of need for data synchronization. The problem of multicriteria optimization is solved by using of hierarchy analysis method. Among the using method's features can be mentioned: different sets of optimality criteria for the evolving individuals; quantifying of the data representation marker value into 5 alternatives and automatically presetting the matrices of pairwise comparisons on the last level of the hierarchy. Solving the problem of multicriteria analysis and choosing the best alternative makes possible to determine the optimal level of the data representation marker. It makes possible to classify the attributes and tuples of DB relations according to their representation on the node of distributed CIS.</p>
      </abstract>
      <kwd-group>
        <kwd>1 Corporate information system</kwd>
        <kwd>database management system</kwd>
        <kwd>distributed database</kwd>
        <kwd>SQLquery</kwd>
        <kwd>data replication</kwd>
        <kwd>multidimensional analysis</kwd>
        <kwd>multicriteria problem</kwd>
        <kwd>analytic hierarchy process</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        In information systems development, there is a trend of transition from local to distributed
databases (DDB). There are many database management systems (DBMS) that allow you to host,
maintain and process data on various nodes of computer information systems (CIS). The main task of
distributed database management systems is to provide access control to the data of many users and
ensure the integrity and consistency of data [1]. Within one company there is a need to automate
different types of accounting [
        <xref ref-type="bibr" rid="ref1">2, 3</xref>
        ]. The attempt to automate all types of accounting leads to so-called
"universal" corporate information systems [
        <xref ref-type="bibr" rid="ref1">3</xref>
        ], which create a single accounting environment and
provide access to all necessary data for analysis and decision support. This approach has many
disadvantages [
        <xref ref-type="bibr" rid="ref2">2, 4</xref>
        ], which can be eliminated by using separate specialized solutions [
        <xref ref-type="bibr" rid="ref1 ref3">3, 5</xref>
        ]. But this
path leads to use of several databases (and perhaps DBMS) that require their synchronization [
        <xref ref-type="bibr" rid="ref4">6</xref>
        ]. So,
in addition to the main functions of the distributed DBMS: input, storage, processing and sharing data
– a specific important function is to ensure the collaboration of many users with distributed
information [
        <xref ref-type="bibr" rid="ref5 ref6">7, 8</xref>
        ].
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Topicality</title>
      <p>
        The database structure optimizing is considered in [
        <xref ref-type="bibr" rid="ref10 ref11 ref12 ref13 ref7 ref8 ref9">9–15</xref>
        ], but insufficient attention is paid to
improving the automated systems performance by optimizing the structure of the CIS distributed
database on the basis of statistics of SQL-queries. Also, in [
        <xref ref-type="bibr" rid="ref7 ref8 ref9">9–11</xref>
        ] while considering the design of
automated control and data processing systems, building of data warehouses and multidimensional
models, the use of a combined strategy of distributed data representation in CIS is not considered. In
[
        <xref ref-type="bibr" rid="ref10 ref11 ref12 ref13">12–15</xref>
        ] the authors consider the issue of increasing the productivity of automated systems through the
use of materialized views, database restructuring and relations denormalization. However, the
optimality of the structure of a single distributed CIS node is ignored. A key factor influencing the
reliability and accessibility of the database is the so-called localization of links [
        <xref ref-type="bibr" rid="ref3">5</xref>
        ]. If the database is
distributed so that the data hosted in a node is called exclusively by its user, it indicates a high level of
link localization. If such data distribution is not possible and to execute the user's requests you need to
access the information of other nodes, it indicates a low level of links localization.
      </p>
      <p>A combined data distribution strategy is the best in terms of combining the benefits of strategies
with and without duplication. But when using it, in addition to the task of synchronizing duplicate
information, the task of designing the structure of the database is actual, depending which node data
belonging to. In addition, the performance of the system will directly depend on the decision on the
need for partial or complete duplication of data. Some tables of a relational database can be duplicated
completely, and some – after projection and selection. That is, for optimized data representation on a
remote node, it is necessary to use vertical and horizontal data fragmentation procedures.</p>
      <sec id="sec-2-1">
        <title>Therefore, the issue of data distribution between nodes of distributed and territorially dispersed</title>
      </sec>
      <sec id="sec-2-2">
        <title>CIS is quite important. Therefore, the task of optimizing the structure of the database of a geographically remote node in corporate information systems is relevant.</title>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Purpose of publication</title>
      <sec id="sec-3-1">
        <title>The purpose of the research is to create a mathematical optimization model and subsequent</title>
        <p>choosing the best alternative to the marker of data representation of the remote node of distributed</p>
      </sec>
      <sec id="sec-3-2">
        <title>CIS. The research is related to only to relational databases. The relational data model is based on a simple and at the same time powerful mathematical apparatus, based mainly on theory of sets and mathematical logic [10, 16]. So, when building a mathematical model, it is considered appropriate to use the basic concepts of set theory.</title>
        <p>
          The developed model should take into account the statistics of user requests to local and remot e
data. Using filtering by selected dimensions, the appropriate subsets of data can be obtained [
          <xref ref-type="bibr" rid="ref15">17</xref>
          ]. For
dimension elements, the term "data representation marker" was proposed, which determines the level
of their need at the node of the distributed corporate information system (DCIS). From the value of
this marker, aggregated on the database subset, corresponding to the remote node, will depend on the
values of the criteria of model efficiency. It is independence from the central node of the database, the
size of the local database and the level of data synchronization [
          <xref ref-type="bibr" rid="ref16 ref17">18, 19</xref>
          ]. Therefore, one of the tasks is
the mathematical representation of the optimality criteria dependence on the value of the data
representation marker.
        </p>
        <p>
          The obtained multicriteria problem must be solved to determine the optimal level of data
representation marker. It should be noted that the optimality criteria, the models of which were
defined, are independent, monotonic and are represented on the set of real numbers in the interval [0;
1]. The classical methods of Pareto and Slater [
          <xref ref-type="bibr" rid="ref18 ref19">20, 21</xref>
          ] can give results only at the first stage of
modeling. But when calculating the optimal level of the data representation marker they are
ineffective due to the decrease in the level of one criteria while increasing others. The solution of the
problem is also complicated by the fact that the solution space is defined on a set of real numbers, and
therefore the set of solutions contains a large number of alternatives.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. The main part</title>
      <p>
        Among the well-known relational algebra operations [
        <xref ref-type="bibr" rid="ref8">10</xref>
        ], due to the horizontal and vertical
fragmentation of data on the distributed CIS node, the operations "projection" (hereinafter P) and
"selecting" (hereinafter S) are considered here. Let tup – be a tuple of the relation R, tup[P] be a part
of this tuple containing only the values of the attributes that are included in the subset P of the relation
scheme Rshema (P ⊂ Rshema). Then the projection of R on P will be the relation, consisting of tuples of
all values from the set P, which exists in the relation R, i.e. R[P] = {tup[P] | tup ∈ Rdata}. The scheme
of the resulting set can be defined by the following set of attributes: R[P]shema = {A1 …, Am}, where Ai
∈ Rshema. The selection displays tuples, and the result is a relation containing a subset of all unique
tuples of the relation R, for which a certain logical condition is true R[S] = {tup | tup ∈ Rdata ˄ F(tup,
S) = true}, where S is a logical condition of SQL-query, and F(tup, S) is a function that reflects its
fulfillment for the corresponding tuple. The scheme of the resulting set will equal to the scheme of the
basic relation, i.e. R[S]shema = R shema. Within the SQL-query for data selecting, a number of relations
can be involved, all of which are the result of sequential execution of select and projection operations
to the base relation (database table). R'' = R'[P], where R' = R[S], i.e.
      </p>
      <p>R'' = {tup[P] | tup[P] ∈ R[P]data ˄ F(tup, S) tr=ue}
(1)</p>
      <p>Q = &lt;Workplace, User, Application, Rset'', Qsetinner&gt;,
where workstation = &lt;Type, Location&gt;; User = &lt;Role, Name&gt;; Rset'' = { R'' | {tup[P] | tup[P] ∈
R[P]data ˄ F(tup, S) = true} } – the set of resulting relations obtained from the basic relations (tables)
of the database by the corresponding queries;  
– is a set of nested queries of the main query Q.</p>
      <sec id="sec-4-1">
        <title>When planning the structure of the database of the remote node of distributed CIS, several factors</title>
        <p>will be involved - availability and speed of data obtaining, independence from the central DB node,
the DB size, the level of data reliability, the need for further synchronization.</p>
      </sec>
      <sec id="sec-4-2">
        <title>In the first step, the simulation begins with the presentation in the remote node the complete copy of the central node DB. In this case, the data availability and independence from the central node of</title>
      </sec>
      <sec id="sec-4-3">
        <title>Considering the set of queries to the database, the resulting subset R''union of the base relation R can</title>
        <p>be defined as the union of subsets R' of all queries received by the database from a remote node

R''union = ⋃
 =1</p>
        <p>′′ , or
R''union = {tup[Punion] | tup[Punion] ∈ R[Punion]data ˄ F(tup, Sunion) = true},</p>
        <p>where tup[Punion] = ⋃
 =1</p>
        <p>[  ] , and Sunion = ⋁</p>
        <p>=1</p>
        <p>To avoid the need for further replication some data that required on the DDB node can only be
presented on the central node of the database and participate the query through the use of distributed
queries. So the resulting relation Rremote will only be a subset of R''union. Due to the fact that to
represent the data on the remote node it is necessary to use elements of both vertical and horizontal
data fragmentation (both projection and selecting), a subset of the base relation R that will describe
the relation of the remote node can be represented as follows:
  ℎ
= {A | A ∈ Rshema , Rprimary ⊂   ℎ
, A ∈ Rprimary ˅ Fa(Node, A) = true}</p>
      </sec>
      <sec id="sec-4-4">
        <title>To make a decision on the attribute representation on a node, the function Fa(Node, A) will be used. Besides, the set of attributes of the relation primary key in any case must be represented on the remote node. The set of tuples, in turn, will be determined by the formula:</title>
        <p />
        <p>= { tup | tup ∈ Rdata , tupprimary ∈ Rremote-depdata ˅ Ftup(Node, tup) = true}</p>
      </sec>
      <sec id="sec-4-5">
        <title>As we can see, the tuple must be represented in the case of entering its primary key to the set of</title>
        <p>these relations, depending on the current. Otherwise, the need for data is solved using the evaluation
function Ftup(Node, tup).</p>
        <p>The model of presenting user queries should support the possibility of their further classification
according to belonging to a particular workplace, location, user role and other criteria that can be
added to the model. That is, the user query is defined as
(2)
(3)
(4)
the database has a maximum level. The speed of data obtaining compared to the central node is
usually lower due to less powerful computing resources, but can be increased by performing selecting
and projection operations and decreasing the number of data locks. The local database is large,
therefore this criteria is not optimal. Also, all data requires synchronization with the central node,
which is quite a resource-intensive operation.</p>
      </sec>
      <sec id="sec-4-6">
        <title>The second step is to exclude all unnecessary data from the remote node. To solve this problem, on</title>
        <p>
          the basis of a relational model of user SQL-queries (4) was created a multidimensional database [
          <xref ref-type="bibr" rid="ref20">22</xref>
          ]
with following set of dimensions: &lt;DateTime, WorkplaceType, Location, UserRole, Application, R,
        </p>
      </sec>
      <sec id="sec-4-7">
        <title>A, tup&gt;. For the dimensions elements the term of data representation marker is proposed. It reflects</title>
        <p>the level of data representation necessity at the node of distributed CIS. For each element value of
marker is taken from the following set: {"necessary", neutral", "not required"}. To dimension the
"Location", the marking is performed automatically with the value "necessary" for the corresponding
remote node and "not required" for all others.</p>
        <p>
          When determining the value of the representation marker for the row of the fact table [
          <xref ref-type="bibr" rid="ref20">22</xref>
          ], the max
function is used, which reflects the principle of absorption. Determining the value of the marker when
performing the consolidation of rows of the fact table on the values of &lt;R, A, tup&gt; (for the table cell)
can be performed by moving average method. But the question of the specific influence of each
dimension remains unresolved. In addition, it should be taken into attention, that for some subsets of
dimensions pessimistic scenario should work (data is needed, no matter what), and for some
optimistic (data should not be duplicated in any case).
        </p>
        <p>So, we have a model where each dimension attribute has a value, a marker and a weight Adim
={Val, Mrk, vol}, where Mrk = {"obligatorily", "necessary", " neutral", "not required" , "forbidden"},
and vol – weight (ignored for the values of the marker "obligatorily" and "forbidden"). By converting
a non-numeric linguistic variable of markers into a numeric value ("obligatorily" – "2", "necessary" –
"1", "neutral" – "0", "not required" – "-1", "forbidden" – "-2"), the aggregation function was defined:

  =1   = {−2, 
2,  ∃</p>
        <p>= 2
∃</p>
        <p>∑ =1(
 = − 2 ˄ ∄</p>
        <p>= 2
 ∗</p>
      </sec>
      <sec id="sec-4-8">
        <title>When deciding on the data representation on a remote node, we consolidate the rows of the fact table by the tuple &lt;R, A, tup&gt; and calculate the value of the marker for each of its elements by formula (5). And based on following the decision about data representation is made:</title>
        <p>Repr (Node,R,A,tup)=(
( ,  , 

) =1
 &gt; 


),
– the threshold coefficient of data representation in a certain node Node, that is
where 


defined at the range of [-1, 1].</p>
      </sec>
      <sec id="sec-4-9">
        <title>The third step is to completely abandon the local database and place all the data on the central</title>
        <p>node (or, in some cases, in other nodes) of distributed CIS. In this case, we have the maximization of
optimality for criteria of the need for data synchronization. That is because there is no duplication of
data. The level of reliability is also maximum, and the size of the local database has a minimum value
(no local database). But, at the same time, the availability of data and the access speed are minimized,
and the work of CIS is highly dependent on the central node availability.</p>
      </sec>
      <sec id="sec-4-10">
        <title>The value of some criteria improved compared to the second step, but at the same time the value of</title>
        <p>the others got worse. It is logical to assume that the optimal values of all DCIS DB structure criteria
acquire between the 2nd and 3rd steps. To be able to perform the analysis and find the optimal
distribution of data between the remote and central nodes, it is necessary to formalize the database
structure quality criteria.</p>
      </sec>
      <sec id="sec-4-11">
        <title>Criterion of independence from the central database node, and, accordingly, the availability and</title>
        <p>access speed directly depend on the representation of user SQL-query data on the node of distributed</p>
      </sec>
      <sec id="sec-4-12">
        <title>CIS. Using the model of the user SQL-query (4) and the resulting relation of the remote node (1, 2), we can determine the function of the request data availability:</title>
        <p>Favailab(Node, Q) =
∀  
∃  




1,  ∀  ’’ ∃   ℎ</p>
        <p>,  ’’ ∈   ℎ
0,  ∃  ’’ ∄   ℎ
,  ’’ ∈   ℎ
( 
( 
element the function of belonging to a remote node is equal to one.
as the average value
where
{
→ 
→ 
Favailab(Node, Q) =
→  availab(Node,</p>
        <p>) = 1)
1,  ( ∃  ′′ ∈</p>
        <p>′′
( ′′)
( ∃  
0,  ( ∀  ′′ ∈</p>
        <p>′′
( ′′)
( ∀  
 =1
∈  
 =1
∈  
 &gt; −1) ˅
 ≤ −1) ˄
{
→  availab(Node,  
) = 0)
KoefsizeR =
0

(
 )</p>
      </sec>
      <sec id="sec-4-13">
        <title>Next, when determining the size of R '' (subset of R) we use the following formula</title>
        <p>SizeR’’ = KoefsizeR × p’ × ∑ ′ 
 =1
(

)
(7)
(8)
(9)
(10)
(11)</p>
        <p>
          Next, we consider the criterion of the local database size. This criterion affects both the
performance of queries to the local database and the power of computing resources required to
perform database and CIS administration operations. The database under the relational DBMS control
(including distributed) is presented on disk space as a file or group of files [
          <xref ref-type="bibr" rid="ref5 ref6">7, 8</xref>
          ]. At the same time,
any modern relational DBMS has mechanisms for obtaining information about how much disk space
is used by each relation. In the vast majority of cases, the total value of the relations size equals the
total value of the database files sizes.
        </p>
      </sec>
      <sec id="sec-4-14">
        <title>But the information about size of R does not make it possible to determine the size of R'', which is</title>
        <p>
          the result of a sequence of selecting and projection operations, and is part of the set Rremote. On the
other hand, each DBMS provides information about the amount of disk space required to store the
value of the attribute defined on a particular domain [
          <xref ref-type="bibr" rid="ref5 ref6">7, 8</xref>
          ]. The size of the tuple can be determined as
SizeR = 
0

+ p х ∑ =1 
(
 ),
empty.
where Ai ∈Di ∈Typei , p – is the relation power, and SizeR0iDBMS – is the size of the i-th relation if it is
        </p>
        <p>However, the values obtained by (9) cannot be used in calculations, because SizeR almost never
equals to SizeRdbms. This may be due to the presence of additional data structures (indexes) related to
the table, as well as other properties of data representation on the disk. Therefore, for each relation we
determine the correction factor
where p is the power of R'', n' – is the number of elements of the set Rremote
shema (number of attributes),
and each attribute Ai ∈ Di ∈Typei.</p>
      </sec>
      <sec id="sec-4-15">
        <title>But for each individual case of the subject area, the size (11) will take different values, and therefore its absolute value has no sense. Therefore, it was decided to present the final value of the criterion of the local database size in proportion to the size of the database in the CIS central node.</title>
        <p>Fsize = ∑
 =1




 ′′,</p>
        <p>
          The last of the above criteria is the need for data synchronization. First, we define a subset of the
remote node data for which the data change operations are performed. To do this, define the model of
the SQL-query that modify data Qmodif = &lt;Виміри, R''modif, type&gt;, where R''modif – is a subset on the
relation R, which changes due to data modification operations, type = {insert, update, delete} –
operation type. R''modif is defined as
data require the use of more resource-intensive synchronization algorithms [
          <xref ref-type="bibr" rid="ref4">6</xref>
          ].
        </p>
        <p>will determine the subset of the basic relation on which data conflitct can take place. This
where S – is a logical condition, defined in SQL query, F(tup, S) – is a function that reflects its
fulfillment for the corresponding tuple, and P modif – is a set of attributes that are modified.</p>
        <p>Considering the set of queries to the database, the resulting subset  ′′
of the base relation R
can be defined as the union of subsets R'' modif of all queries (13) received by the database from the

remote node  ′′

= ⋃ =1  
′′</p>
        <p>.</p>
      </sec>
      <sec id="sec-4-16">
        <title>Similarly, we define the set  ′′</title>
        <p>, which will be modified on the central node or other nodes
with future synchronization with the central node. The intersection of the sets  ′′

R''modif = {tup[Pmodif] | tup[Pmodif] ∈ R[Pmodif]data ˄ F(tup, S) = true}
(13)


– the cardinality of all queries, attributes and tuples of which are
(12)</p>
        <p>та
(14)
(15)</p>
        <p>Based on (14), we add to the multidimensional DB (5) the dimension SyncroFlg = {true, false},
which will be determined on the tuple &lt;R, A, tup&gt;. Next, based on the aggregate value of the
representation marker 

 =1
 and the representation coefficient 


perform
filtering of the multidimensional DB according to the decision on representation (6) and SyncroFlg =
true. Aggregate the results by &lt;R, A, tup&gt; and count the number of queries. The ratio of the obtained
value to the total number of queries according to (6) will be an indicator of the level of data
synchronization need</p>
        <p>Fsynchro =</p>
        <p>,


  
– relation power, including queries of the remote node (according to the decision on
representation), which includes the values of the tuples attributes (cells), which are also included in
where  
the set R'' modif</p>
        <p>node, and  
represented in the remote node.</p>
        <p>
          A multicriteria problem, that was obtained, must be solved to determine the optimal level of data
representation marker. Classical Pareto and Slater methods [
          <xref ref-type="bibr" rid="ref18 ref19">20, 21</xref>
          ] can give results only at the first
stage. But when calculating the optimal level of data representation marker are ineffective due to the
decrease in the level of some criteria of optimality while increasing others. The solution of the
problem is also complicated by the fact that the solution space is determined on a set of real numbers,
and therefore the set of solutions contains many alternatives. The analytic hierarchy process (AHP),
which is a general methodology for solving a wide class of decision-making problems, allows to
combine a relatively simple mathematical apparatus with knowledge and experience of the decision
maker. The basis of this method is the representation of the decision process in the form of a
multilevel hierarchy. This hierarchy should reflect all the components of the problem to be solved.
        </p>
      </sec>
      <sec id="sec-4-17">
        <title>The method is based on the principles of decomposition, pairwise comparisons and hierarchical</title>
        <p>composition. The main stages of the method are building a hierarchy, estimating the importance and
priorities, checking the consistency of priorities and synthesis of the solution.</p>
      </sec>
      <sec id="sec-4-18">
        <title>When compiling the hierarchy, following relationship between the levels elements was used: goal</title>
        <p>- stakeholders - criteria - alternatives. The value of the data representation marker (alternative) is a
real number in the interval [-1, 1]. It leads to potential large number of alternatives at the 4th level of
the hierarchy and therefore the matrices of pairwise comparisons by criteria can become very big.</p>
      </sec>
      <sec id="sec-4-19">
        <title>This complicates estimation process for the decision makers. It is proposed to simplify the task by</title>
        <p>reducing the number of alternatives to 5: "low" (L) – "-1", "lower them medium" (LM) – "-0.5",
"medium" (M) – "0", "higher then medium" (HM) – "0.5", and "high" (H) – "1". The level of
"decision makers" is represented by the elements "Owner", "Database Administrator", "Database</p>
      </sec>
      <sec id="sec-4-20">
        <title>Developer" and "CIS Operator". The obtained hierarchical model is shown in Fig. 1.</title>
        <p>Note that the list of criteria differs for the decision makers. Thus, all three criteria are important for
the owner (the database size, the need for synchronization and independence from the central
database), because they have influence on both the quality of CIS and the cost of equipment. For the
database administrator, the criteria of database size and the need to organize data synchronization are
important. In turn, for the database developer and CIS operator, the criterion of database size is not
critical. It is clear that the relative weight of each of the criteria for different decision makers will also
differ. Using the scale of relative importance of the criteria [23] and with the involvement of the
decision maker (which at this stage is the owner) we build a matrix of pairwise comparisons for
decision makers (Table 1). At the third level of the hierarchy, the corresponding matrices of pairwise
comparisons are formed according to the criteria of optimality for each decision maker. Thus, for the
decision maker "owner" we have the following matrix of pairwise comparisons of optimality criteria
(Table 2).</p>
      </sec>
      <sec id="sec-4-21">
        <title>To check the conflicts existence between matrix elements, the consistency index (CI) is calculated.</title>
        <p>For the data in Table 2 CI = 3.2%, which indicates the allowable level of consistency (in case the
value is higher 10% there is a need to adjust the values of the matrix).</p>
      </sec>
      <sec id="sec-4-22">
        <title>The next step in the classical analytic hierarchy process is to fill in the matrices of pairwise</title>
        <p>comparisons of alternatives separately for each criterion of optimality, similar to Table 1 and Table 2.</p>
      </sec>
      <sec id="sec-4-23">
        <title>In our case, the presence of mathematical models for calculating the values of the optimality criteria</title>
        <p>formulated in (8, 12, 15) allows to perform the initial calculation of matrix data based on numerical
values of the data representation marker for each alternative. Next, the matrix is submitted to the
decision maker for approval. For example, the size of the local node database depending on one of the
five alternatives can change as follows (Table 3).</p>
        <p>Based on the above data, the size of the database at low (min) and high (max) level of the data
representation marker differs by 0.75 / 0.02 = 37.5 times. According to principles of pairwise
comparisons and the axiom of homogeneity, we perform normalization of the values given in Table 3,
using a slightly modified formula of natural normalization:
  
= (
(  − 
   − 
   )
   )
∗ ( − 1) + 1,
where   is the value of the optimality criterion for the i-th alternative, and k = 9.</p>
      </sec>
      <sec id="sec-4-24">
        <title>Normalized according to (17) the database size values (Table 3) are presented in Table 4.</title>
      </sec>
      <sec id="sec-4-25">
        <title>After rounding to the integer according to mathematical rules, we build a matrix of pairwise</title>
        <p>comparisons of alternatives for the criterion of the local database size (Table 5).</p>
      </sec>
      <sec id="sec-4-26">
        <title>According to (4) we perform the calculation of the matrix of alternatives relative weight by the criterion of the local database size. Also, we similarly calculate the priority vectors of alternatives according to the criteria of independence from the central node and the need for data synchronization. As a result, we obtain following vectors.</title>
        <p>=
0,570
0,190
0,095 ,   
0,081
[0,063]
=
0,036
0,082
0,164 ,   
0,328
[0,328]
ℎ
=
(17)
(18)
(20)
Low</p>
      </sec>
      <sec id="sec-4-27">
        <title>Lower then medium</title>
      </sec>
      <sec id="sec-4-28">
        <title>Medium</title>
      </sec>
      <sec id="sec-4-29">
        <title>Higher then medium</title>
      </sec>
      <sec id="sec-4-30">
        <title>High</title>
        <p>Owner
0,563
0,09
0,09</p>
      </sec>
      <sec id="sec-4-31">
        <title>Using the obtained results of the global priorities vectors for decision makers (19), (20) and the matrix of preferences of decision makers (Table 1), we calculate the vector of global priorities of alternatives (Table 6).</title>
      </sec>
      <sec id="sec-4-32">
        <title>By performing the appropriate calculations, we obtain global priority vectors for other decision makers</title>
      </sec>
      <sec id="sec-4-33">
        <title>The performed calculations allow to organize decision support when choosing the optimal level of the data representation marker among the proposed alternatives.</title>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Summary and conclusion</title>
      <sec id="sec-5-1">
        <title>Based on the relational data model, the concept of data slices of the set of database relations is</title>
        <p>formalized. Using the definition of selecting and projection operations, as well as taking into account
the hierarchical structure of user queries, the model that describes their structure was built. This
model includes analytical characteristics and allows to define for each base relation a subset (node
relation), which will consist of elements that are part of the resulting sets of SQL-queries sequence.</p>
      </sec>
      <sec id="sec-5-2">
        <title>The term of data representation marker for elements of analytical dimensions was proposed. Using</title>
        <p>the offered aggregation function the level of representation marker for each attribute and tuple of
relation is calculated. To determine the optimal value of the representation marker, several optimality
criteria are introduced and mathematical models are built for each of them. This allow to calculate
their values depending on the limit level of the data representation marker at the node of distributed
CIS. Solving a multi-criteria problem and finding the optimal level of data representation at a remote
node can increase the level of data availability and efficiency of distributed CIS. Efficiency is defined
as the ratio of result and resources, so taking into account the vector of relative weight of the
optimality criteria of the model (16), we calculate the efficiency as

=</p>
        <p>х  
0</p>
        <p>Thus, the results of the research allow to increase the efficiency of using the distributed CIS node
of the subject area by 25% compared to the presentation of only critical data, and by 11% compared
to the presentation of all necessary data of the central database, respectively. The research can be
followed by presenting the obtained vector of global priorities in the form of fuzzy sets of one
variable. Dephasing the obtained results can make numerical value of the optimal level of data
representation at the RKIS node more accurate.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Acknowledgements</title>
      <sec id="sec-6-1">
        <title>This research was partially supported by the state research projects: “Development of information</title>
        <p>and communication decision support technologies for strategic decision-making with multiple criteria
and uncertainty for military-civilian use” (research project no. 0117U007144, financed by the</p>
      </sec>
      <sec id="sec-6-2">
        <title>Government of Ukraine); “Development of information-analytical system for military-civil application as a information protection factor in the conditions of multi-criteria, uncertainty and risk” (research project no. 0120U101222, financed by the Government of Ukraine).</title>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>7. References</title>
      <sec id="sec-7-1">
        <title>M. Tamer Özsu, Patrick Valduriez. Principles of Distributed Database Systems 3rd ed. Springler, 2011.</title>
      </sec>
      <sec id="sec-7-2">
        <title>M. Dvoretskyi, S. Borovlova, Web-application of warehouse accounting in non-automated points of sale, Science works "Petro Mohyla Black Sea National University", Rel. 308. Т. 320, Series: Computer technologies, 2018, pp. 45–52 (in Ukrainian). 202</title>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <article-title>[3] 1C: Enterprise 8. Management of a trade enterprise for Ukraine, [Online]</article-title>
          . Availably: http://rarus.com.
          <article-title>ua/torgovyy-i-skladskoyuchet/1S_Predpriyatie_8_Upravlenie_torgovym_predpriyatiem_dlya_Ukrainy_/ (in Russian)</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Natalia</given-names>
            <surname>Kozliuk</surname>
          </string-name>
          , Svetlana Uhrymova, Warehouse accounting at trade enterprises,
          <source>Phenix</source>
          ,
          <year>2005</year>
          . (in Russian)
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>H.</given-names>
            <surname>Garcia-Molina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. D.</given-names>
            <surname>Ullman</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Widom</surname>
          </string-name>
          ,
          <source>Database Systems: The Complete Book 2nd Edition</source>
          , Pearson,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <article-title>[6] Automatic synchronization of distributed databases in split mode</article-title>
          , [Online]. Availably: http://stimul.kiev.ua/materialy.htm
          <article-title>?a=avtomaticheskaya_sinkhronizatsiya_raspredelennykh_baz _dannykh_v_razdelennom_rezh (in Russian)</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Maksym</given-names>
            <surname>Kuznetsov</surname>
          </string-name>
          , Yhor Symdianov,
          <article-title>MySQL 5</article-title>
          ,
          <string-name>
            <surname>BHV-Piterburg</surname>
          </string-name>
          ,
          <year>2010</year>
          . (in Russian)
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Dusan</given-names>
            <surname>Petkovich</surname>
          </string-name>
          .
          <source>Microsoft SQL Server</source>
          <year>2019</year>
          :
          <string-name>
            <given-names>A</given-names>
            <surname>Beginner's Guide</surname>
          </string-name>
          ,
          <source>Seventh Edition 7th Edition</source>
          ,
          <string-name>
            <given-names>Kindle</given-names>
            <surname>Edition</surname>
          </string-name>
          ,
          <string-name>
            <surname>Mc-Graw-Hill Education</surname>
          </string-name>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>V. V.</given-names>
            <surname>Pasichnyk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. A.</given-names>
            <surname>Reznichenko</surname>
          </string-name>
          ,
          <article-title>Organization of databases and knowledge bases</article-title>
          , Publishing group BHV,
          <year>2006</year>
          . (in Ukrainian)
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Ye</surname>
          </string-name>
          . V.
          <article-title>Malakhov, Fundamentals of database design: Textbook for students of higher education institutions, Science and technology</article-title>
          , Odesa,
          <year>2006</year>
          . (in Ukrainian)
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>V. V.</given-names>
            <surname>Pasichnyk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. B.</given-names>
            <surname>Shakhovska</surname>
          </string-name>
          ,
          <article-title>Data warehouses: a textbook, Magnoliya</article-title>
          , Lviv,
          <year>2008</year>
          . (in Ukrainian)
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [12]
          <string-name>
            <surname>A. B. Kunhurtsev</surname>
            ,
            <given-names>Yu. N.</given-names>
          </string-name>
          <string-name>
            <surname>Vozovykov</surname>
          </string-name>
          , Finding Patterns in Query Distribution to Manage Materialized Views, in: Proceedings of the Odessa Polytechnic University, Publishing house Odessa National Polytechnic University, Odesa,
          <volume>2</volume>
          (
          <issue>30</issue>
          ),
          <year>2008</year>
          , pp.
          <fpage>135</fpage>
          -
          <lpage>140</lpage>
          . (in Russian)
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [13]
          <string-name>
            <surname>A. B. Kunhurtsev</surname>
            ,
            <given-names>S. L.</given-names>
          </string-name>
          <string-name>
            <surname>Zynovatnaia</surname>
          </string-name>
          ,
          <article-title>Relational database restructuring model by denormalizing the schema of relations</article-title>
          , in: Proceedings of the Odessa Polytechnic University, Publishing house Odessa National Polytechnic University, Odessa,
          <volume>2</volume>
          (
          <issue>26</issue>
          ),
          <year>2006</year>
          , pp.
          <fpage>105</fpage>
          -
          <lpage>111</lpage>
          . (in Russian)
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>V. A.</given-names>
            <surname>Filatov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. V.</given-names>
            <surname>Semenets</surname>
          </string-name>
          ,
          <article-title>Methods and tools for designing information systems and distributed databases</article-title>
          , Bulletin of Kherson National Technical University, Kherson, No
          <volume>4</volume>
          (
          <issue>27</issue>
          ),
          <year>2007</year>
          , pp.
          <fpage>203</fpage>
          -
          <lpage>207</lpage>
          . (in Russian)
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>S. V.</given-names>
            <surname>Lazdyn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Yu</surname>
          </string-name>
          . Zemlianskaia,
          <article-title>Optimization of distributed corporate information networks using genetic algorithms and object modeling, Scientific works DonNTU</article-title>
          , No 147,
          <year>2009</year>
          , pp.
          <fpage>83</fpage>
          -
          <lpage>95</lpage>
          . (in Ukrainian)
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Rebecca</surname>
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Riordan. Designing Relational Database Systems (Dv-Mps Designing</surname>
          </string-name>
          ), Microsoft Press,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>M.</given-names>
            <surname>Dvoretskyi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dvoretska</surname>
          </string-name>
          ,
          <article-title>Information technology for determining useful data while optimizing the structure and minimizing the volume of the distributed database node</article-title>
          , Bulletin of Cherkasy State Technological University, Cherkasy, 4/
          <year>2019</year>
          ,
          <year>2019</year>
          , pp.
          <fpage>26</fpage>
          -
          <lpage>35</lpage>
          . (in Ukrainian)
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>M.</given-names>
            <surname>Dvoretskyi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dvoretska</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Nezdoliy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Borovlova</surname>
          </string-name>
          ,
          <article-title>Data Utility Assessment while Optimizing the Structure and Minimizing the Volume of a Distributed Database Node</article-title>
          ,
          <source>in: Proceedings of the 1st International Workshop on Information-Communication Technologies &amp; Embedded Systems (ICTES</source>
          <year>2019</year>
          ), Mykolaiv,
          <year>2019</year>
          . pp.
          <fpage>128</fpage>
          -
          <lpage>137</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>M.</given-names>
            <surname>Fisun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dvoretskyi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Shved</surname>
          </string-name>
          and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Davydenko</surname>
          </string-name>
          ,
          <article-title>Query parsing in order to optimize distributed DB structure</article-title>
          ,
          <source>in: Proceedings of the 9th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS)</source>
          , Bucharest,
          <year>2017</year>
          , pp.
          <fpage>172</fpage>
          -
          <lpage>178</lpage>
          . doi:
          <volume>10</volume>
          .1109/IDAACS.
          <year>2017</year>
          .8095071
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>I.</given-names>
            <surname>Kovalenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Davydenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Shved</surname>
          </string-name>
          ,
          <article-title>Searching for Pareto-Optimal Solutions</article-title>
          . In: Shakhovska N.,
          <string-name>
            <surname>Medykovskyy</surname>
            <given-names>M</given-names>
          </string-name>
          . (eds) Advances
          <source>in Intelligent Systems and Computing IV. CCSIT 2019. Advances in Intelligent Systems and Computing</source>
          , vol
          <volume>1080</volume>
          . Springer, Cham,
          <year>2020</year>
          , pp.
          <fpage>121</fpage>
          -
          <lpage>138</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>030</fpage>
          -33695-0_
          <fpage>10</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>A.</given-names>
            <surname>Shved</surname>
          </string-name>
          , I. Kovalenko,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Davydenko</surname>
          </string-name>
          ,
          <article-title>Method of Detection the Consistent Subgroups of Expert Assessments in a Group Based on Measures of Dissimilarity in Evidence Theory</article-title>
          . In: Shakhovska N.,
          <string-name>
            <surname>Medykovskyy</surname>
            <given-names>M</given-names>
          </string-name>
          . (eds) Advances
          <source>in Intelligent Systems and Computing IV. CCSIT 2019. Advances in Intelligent Systems and Computing</source>
          , vol
          <volume>1080</volume>
          . Springer, Cham,
          <year>2020</year>
          , pp.
          <fpage>36</fpage>
          -
          <lpage>53</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>030</fpage>
          -33695-
          <issue>0</issue>
          _
          <fpage>4</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>A.</given-names>
            <surname>Barsegyan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Kupriyanov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. V.</given-names>
            <surname>Stepanenko</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. I. Holod</surname>
          </string-name>
          ,
          <article-title>Methods and models of data analysis: OLAP and Data Mining, BHV-</article-title>
          <string-name>
            <surname>Petersburg</surname>
          </string-name>
          ,
          <year>2004</year>
          . (in Russian)
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>