<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Performance optimization algorithm of a distributed database with a hierarchical network topology</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Arij Al Adel Post-graduate of Faculty of Radio Engineering and Cybernetics Moscow Institute of Physics and Technology(national research institute) Moscow</institution>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>National Research University “Higher School of Economics” Moscow</institution>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <fpage>301</fpage>
      <lpage>307</lpage>
      <abstract>
        <p>This paper addresses analyzing the data flows that appear when distributed database works. An algorithm for optimizing database replication is proposed. As a protocol data replication two-phase commit protocol (2PC) with two levels of lock records (shared lock (Shared), and an exclusive lock (Exclusive)) is considered.</p>
      </abstract>
      <kwd-group>
        <kwd>enterprise information system</kwd>
        <kwd>distributed database</kwd>
        <kwd>two-phase commit protocol (2PC)</kwd>
        <kwd>optimization algorithm</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>ܶ ாூௌ
= ܶ ௅஼
+ ܶ ்௅</p>
      <p>
        + ܶ ௅ௌ ,
Modern enterprise information systems (EIS) applied for enterprises automation are usually designed using databases.
Enterprises with a distributed structure naturally faced with the automation of the entire enterprise challenge. Usually, the
automation of separate structural subdivisions through the simulation of business processes can improve the situation within
the subdivisions but at the same time there is the problem of data synchronization of various systems. The solution to this
problem is to develop a distributed EIS. The EIS includes a distributed database [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] consisting of a set of local databases,
server equipment, system software, client computers and application software. The functioning EIS may limit or even
determine the speed at which business processes are performed. Therefore, the task of optimizing the performance of EIS
becomes very important. The EIS database at any given time contains all the information about automated business processes.
The business process should be implemented in the most efficient manner. Thus the performance of EIS significantly affects
the efficiency of enterprise business processes. There are many methods for evaluating the performance of EIS [
        <xref ref-type="bibr" rid="ref2 ref3 ref4">2, 3, 4</xref>
        ]. The
main criterion for the effectiveness of EIS used in these methods is economic efficiency. Due to the fact that for many business
processes related to customer service, production operation, etc., the main indicator of their effectiveness is ܶ ாூௌ - the time
spent by EIS on the operation.
      </p>
      <sec id="sec-1-1">
        <title>Where,</title>
        <p>ܶ ௅஼ is the time taken by the application executed on the client computer to complete the business process operation (Client
level),
ܶ ்௅ - the time taken to transfer data between the client computer and the database server (Transportation level),
ܶ ௅ௌ - the time taken by the server to complete requests / transactions (Server level).</p>
        <p>For geographically distributed enterprises business processes are also distributed. Therefore, to increase the efficiency of
geographically distributed EISs it is necessary to optimize the ܶ ௅ௌ time by constructing a replication scheme.
Now there are several methods which aim to optimize database performance. These methods can be applied at all stages of
system design: at the knowledge domain analysis, at the design of the logical structure of the database, at the synthesis of the
physical database model. Knowledge domain analysis affects the logical structure of the database, the database logical
Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
structure has a direct impact on the algorithms of software applications and the functioning of the system software [5].
Therefore, the optimization process is carried out consistently. At each subsequent stage the results of the previous one are
used. Despite the undoubted theoretical and practical interest, a common disadvantage of all methods is that to solve the
optimization problem, information is needed about the work of all components of the EIS architecture the collection of which
is difficult.</p>
        <p>The proposed algorithm has several advantages: it does not have restrictions on the number and structure of databases, it
requires a minimum amount of input data, it is easy to implement. It can serve as a basis for designing the network topology
optimization algorithms it makes it possible to evaluate the design solution in the early stages of development.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2 Problem Statement</title>
      <p>Consider the hierarchical structure of the enterprise network. In general database servers located at network nodes may be
interconnected by a multiple channels forming a graph. A tree network is very important case and it is often used in practice.
This type of network topology is used for vertically integrated companies. For example, the network of the gas stations have
a 3-tier administration structure. Suppose one server as root server. It is connected to data transfer channels with subordinate
servers. Child servers are not interconnected by information transfer channels. Child servers, in turn, may have other child
servers. The root server always contains a database, which we will call the Central Database (CDB). The remaining databases
will be called Remote Database (RDB). Figure 1 shows hierarchical topology of a distributed database.
when r1 &lt;&gt; r2 and r1 is the immediate parent of the server r2 , or when r1 = r2
when r1 &lt;&gt; r2 and r1 is not the immediate parent of the server r2
Let W</p>
      <p>Depending on the replication protocol used various elementary operations on database records can be distinguished.</p>
      <p>^ ws / s 1, S` — a set of elementary operations on database records.</p>
      <p>Consider the operation of a distributed database over the period T . Let every client be connected to any of the servers.
Each client performs different tasks. The application sends requests to the server directly connected to the client’s computer.</p>
      <p>Let / [ irs — number of elementary operations ws over a period T to be performed on a table di on the node qr .</p>
      <sec id="sec-2-1">
        <title>We introduce a vector of variables that describes the deployment of tables on every server.</title>
        <p>ܺ ൌ ሾ ݔ ௥௜ ሿ , where xir 1 if table di belongs to the node qr , xir 0 otherwise. Since all tables are stored on the root
server, so we have xi0</p>
        <p>CDB is to store and process all information about the enterprise activities. Its structure is the same for the remote databases.
Remote databases contain a subset of information from the CDB. Any of the remote database can be retrieved from the
information contained in the CDB. All servers run a special program - the database server. Each of these programs can access
the server's RAM, server disk memory, receive and transmit network requests. The collaboration of these programs allows to
perform elementary operations on the database [6, 7].</p>
      </sec>
      <sec id="sec-2-2">
        <title>We provide a formal description of the distributed database model.</title>
      </sec>
      <sec id="sec-2-3">
        <title>Let the Central database D consists of some set of tables:</title>
        <p>D ^ di / i 1, `I
All records of one table have the same number of attributes. Let's call the number of attributes in the table - “Table length”.
K ^ ki / i 1, `I — set of the Table length
— number of records in table i
— set of servers (network nodes)
N ^ ni / i
Q ^ qr / r</p>
        <p>1, `I
0, R`
Let q0 the root server.
ܪ ൌ ൣ݄
{ 0,</p>
        <p>௥ భ ௥ మ ൧
1,</p>
        <p>— the correspondent matrix to the topology of the computer network where hr1r2 takes the following values:
performing operations on the table di over a period T .
operations on the table di over the period T .</p>
      </sec>
      <sec id="sec-2-4">
        <title>We introduce the following notation:</title>
        <p>B bir — the total amount of transmitted data over the channel, that bind the node qr with the parent node when</p>
        <p>— total amount of data received by channel, that bind node qr with the parent node when performing
The task of optimizing the performance of distributed DB is formulated as follows: it is required to find such an
arrangement of tablesܺ ൌ ሾ ݔ ௥௜ ሿ at which the minimum of the transmitted data is reached
min ¦
bir</p>
        <p>cir
i,r
(1)</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3 Designing a performance optimization algorithm for a distributed database</title>
      <p>To get the values B
bir and C
cir</p>
      <p>we analyze the data flows resulted in the execution of one elementary operation on
the database. Each elementary operation leads to the data transfer over each channel of the network. Let
ܵ݁݊݀ሺ ௜ ǡ ݎǡ ݍ ௥ ೕ ǡ ݓ ௦ ǡ ܺሻ - the amount of transmitted data, and ݒܴܿ݀ሺ ௜ ǡ ݎǡ ݍ ௥ ೕ ǡ ݓ ௦ ǡ ܺሻ the amount of received data, respectively,
on the channel binding the server qr , located on the child node, with the server on the parent node, when performing an
elementary operation ws on table di , which was initiated from the server q r j
with a given replication table X ,
then
bir
¦ [ irj s  Send (di , r, qrj , ws , X ) , cir
rj ,s
¦ [ irjs  Rcv(di , r, qrj , ws , X )
rj ,s
(2)
Consider as a protocol data replication two-phase commit protocol (2PC) with shared and exclusive locks. [8].</p>
      <p>Let a transaction be initiated on one of the nodes, which changes the state of the data located in the database on the network
nodes Participants in a transaction are interacting processes that run on servers which in turn contain database table records.
One of the processes is selected and referred to as "Coordinator". Coordinator sends a request to the involved processes. All
processes respond "committed" or "failed", depending on whether the requested operation can be performed. When responses
from all processes came the Coordinator takes one of the decisions “commit a transaction” or “rollback a transaction” and
sends the decision made to all participants in the transaction. Later when analyzing the data flow it is assumed that all
transactions are successfully completed [9]. The distributed transaction is shown in Figure – 2.</p>
      <sec id="sec-3-1">
        <title>Operation 4</title>
      </sec>
      <sec id="sec-3-2">
        <title>Deleting an entry (with the already superimposed exclusive lock)</title>
      </sec>
      <sec id="sec-3-3">
        <title>Operation 9</title>
      </sec>
      <sec id="sec-3-4">
        <title>Operation 5</title>
      </sec>
      <sec id="sec-3-5">
        <title>Replacing the</title>
        <p>record (with the
already
superimposed
exclusive lock)</p>
      </sec>
      <sec id="sec-3-6">
        <title>Operation 10</title>
        <p>five database servers. Moreover the servers q0 , ݍ ଵ ݍ ଶ , q4 — contain a table record, q3 — does not contain a table record
( xi0 1 , ݔ ଵ௜ ൌ ͳ xi2 1 , xi3 0 , xi4
transmitted data at each stage over the transaction are shown in Table 2.</p>
        <p>1 ). The transaction process is shown in Figure 3. The corresponding amounts of
7
0
0
0
2
5</p>
        <p>3
q3
1. Request to the parent server to get a record</p>
      </sec>
      <sec id="sec-3-7">
        <title>2. Inquiry about availability for child servers</title>
      </sec>
      <sec id="sec-3-8">
        <title>3. Get answers about the readiness of the child</title>
        <p>servers</p>
      </sec>
      <sec id="sec-3-9">
        <title>4. Commit command to the server transaction</title>
        <p>initiator and, if necessary data transmission</p>
      </sec>
      <sec id="sec-3-10">
        <title>5. Commit command to other child nodes</title>
        <sec id="sec-3-10-1">
          <title>The server qr contains the table di</title>
        </sec>
      </sec>
      <sec id="sec-3-11">
        <title>Data transfer direction</title>
      </sec>
      <sec id="sec-3-12">
        <title>Column number</title>
      </sec>
      <sec id="sec-3-13">
        <title>Elementary operation</title>
      </sec>
      <sec id="sec-3-14">
        <title>Operation 1</title>
      </sec>
      <sec id="sec-3-15">
        <title>Operation 2</title>
      </sec>
      <sec id="sec-3-16">
        <title>Operation 3</title>
      </sec>
      <sec id="sec-3-17">
        <title>Operation 4</title>
      </sec>
      <sec id="sec-3-18">
        <title>Operation 5</title>
        <p>Based on the data in Table 2, it is possible to get the total amount of transmitted data over each channel. All elementary
database operations are treated in the same way. The results are shown in Table 3.
0
L
L
0
L
L</p>
        <p>L
L
L
L
L</p>
        <p>L
L + ki</p>
        <p>L
L
L
0
L
L
0
2 L</p>
        <p>L</p>
        <sec id="sec-3-18-1">
          <title>Looking at table 3 it is clear that the amount of transmitted data have the form ߙ ڄ ܮ ൅ ߚ ڄ ݇</title>
          <p>௜ where D and E non-negative
integer coefficients. A convenient representation of data amount is matrixes of coefficients $
s 1, ,8 — column number in the Table 2.
D st and %</p>
          <p>E st
where
ܾ ௥௜ ൌ σ ௥׊ ೕ ǡ௦ ߦ ௜௥ ೕ ௦ ڄ ሺ ߙ ௧௦ାଵ ڄ ܮ ൅ ߚ ௧௦ାଵ
where t = notlog ( qr j is subordinate to the node qr )
where column number t determined by the value of the function arguments and the name of the function. Let function notlog
(condition) takes the value 0 if the condition takes the value «true» and value 1 if the condition is “false”.</p>
          <p>Finally the total amount of transmitted data and received over the channel binding the node qr with parent node:
ڄ ܮ ൅ ߚ (3)
ڄ ݇ ௜ ሻ , ܿ ௥௜ ൌ σ ׊௥ ೕ ǡ௦ ߦ ௜௥ ೕ ௦ ڄ ሺ ߙ ௧௦ାଶ</p>
          <p>ڄ ݇ ௜ ሻ
௧௦ାଶ
essentially bir and cir depend only on the value of xir and do not depend on the location of the tables on other servers.
K
L
ܨ ൌ ൣ݂
/
$
௥ భ ௥ మ ൧
[ irs
D st и %</p>
          <p>E st</p>
        </sec>
      </sec>
      <sec id="sec-3-19">
        <title>The following optimization algorithm is proposed. Input data for algorithm</title>
        <p>^ ki / i 1, `I
— table length
— service packet length
— computer network topology
— number of elementary operations
— coefficient matrixes</p>
      </sec>
      <sec id="sec-3-20">
        <title>Result:</title>
        <p>ܺ ൌ ԡ ݔ ௥௜ ԡ — table replication</p>
      </sec>
      <sec id="sec-3-21">
        <title>The algorithm is as follows:</title>
      </sec>
      <sec id="sec-3-22">
        <title>Step 1. The initial values of the replication scheme are set.</title>
        <p>X xir , xi0 1 and xir 0 r z 0
Dlookup</p>
        <p>{1, , I}
Dlookup </p>
        <p>empty the algorithm terminates go to step 8.
Rlookup</p>
      </sec>
      <sec id="sec-3-23">
        <title>Step 3. Servers set that need to be considered is being initialized .</title>
        <p>{0}</p>
        <sec id="sec-3-23-1">
          <title>Step 4. В current server is selected r1  Rlookup , Rlookup</title>
          <p>Rlookup \ {r1}</p>
        </sec>
        <sec id="sec-3-23-2">
          <title>If the set Rlookup </title>
          <p>empty go to step 2.</p>
        </sec>
        <sec id="sec-3-23-3">
          <title>Step 2. A database table that have not yet been reviewed is selected. i1  Dlookup , Dlookup</title>
        </sec>
        <sec id="sec-3-23-4">
          <title>Dlookup \ {i1} . If the set</title>
          <p>0 then go to step 5.
If xi1r2
If xi1r2
to step 5.</p>
        </sec>
      </sec>
      <sec id="sec-3-24">
        <title>Step 8. Completion of the algorithm work.</title>
        <p>Step 5. One of the information channels that bind the current server is selected r1 containing database table i1 with
directly subordinate server r2 . If all channels are reviewed move to step 4.</p>
        <p>Step 6. Data set is processed bi1r2 + ci1r2 , transmitted over the channel r1r2 in cases where the table i1 will not replicate to
the server xi1r2
0 and when the table will be replicated to the child server xi1r2
1</p>
        <p>Step 7. Based on the calculated quantities a decision is made to replicate the table. i1 to server r2 . The table is replicated
to the child server r2 if this can lead to a decrease in the amount of data transmitted over the channel r1r2 .
1 first we add the server r2 к списку серверов, которые необходимо рассмотреть Rlookup
Rlookup</p>
        <p>{r2} go</p>
      </sec>
      <sec id="sec-3-25">
        <title>Maintain optimal replication scheme values X</title>
        <p>xir . with this, minimum total amount of transmitted data is reached
݊݉݅</p>
        <p>Scheme1 Scheme2
0,29 0,02
0,41 0,03
0,56 0,04
0,71 0,05
0,88 0,06
1,06 0,07
1,26 0,08</p>
        <p>The results of numerical experiments showed the efficiency of the replication scheme between database servers when
implementing business processes in the distributed EIS.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>5 Conclusion</title>
      <p>The direction of further research may be the consideration of unreliable information transfer channels, equipment failures,
determination of the time of user requests.The proposed approach can be used in the design of distributed registries using
blockchain technology.
5. Antonios Makris, Konstantinos Tserpes, and Dimosthenis Anagnostopoulos. A novel object placement protocol for
minimizing the average response time of get operations in distributed key-value stores. In Big Data (Big Data) 2017,</p>
      <sec id="sec-4-1">
        <title>IEEE International Conference on, pages 3196–3205. IEEE, 2017.</title>
        <p>6. Bermbach, D., Kuhlenkamp, J.: Consistency in distributed storage systems. Networked Systems, pp. 175-189.</p>
      </sec>
      <sec id="sec-4-2">
        <title>Springer, 2013</title>
        <p>7. Domaschka J., Hauser C.B., Erb, B.: Reliability and availability properties of distributed database systems. In:
Enterprise Distributed Object Computing Conference (EDOC 2014), IEEE 18th International. pp. 226-233. IEEE,
2014
8. https://docs.oracle.com/cd/B19306_01/server.102/b14231/ds_txns.htmhttps://docs.oracle.com/cd/B19306_01/server.</p>
        <p>102/b14231/ds_txns.htm
9. B. M. M. Alom, F. Henskens, and M. Hannaford, "Deadlock Detection Views of Distributed Database," in
International conference on Information Technology &amp; New Generartion (ITNG- 2009) Las Vegas, USA: IEEE</p>
      </sec>
      <sec id="sec-4-3">
        <title>Computer Society, 2009</title>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>M. T.</given-names>
            <surname>Özsu</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Valduriez</surname>
          </string-name>
          ,
          <source>Principles of Distributed Databases (3rd edition)</source>
          (
          <year>2011</year>
          ), Springer, ISBN
          <volume>978</volume>
          -1-
          <fpage>4419</fpage>
          - 8833-1
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Nicolaou</surname>
            <given-names>A. I.,</given-names>
          </string-name>
          <article-title>Firm performance effects in relation to the implementation and use of enterprise resource planning systems</article-title>
          .
          <source>J Information System</source>
          ,
          <volume>18</volume>
          ,
          <fpage>79</fpage>
          -
          <lpage>105</lpage>
          ,
          <year>2014</year>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Hawary</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Heeks</surname>
            <given-names>R</given-names>
          </string-name>
          .
          <article-title>Explaining ERP failure in a developing country: a Jordanian case study</article-title>
          .
          <source>J Enterprise Information Manage</source>
          ,
          <volume>23</volume>
          (
          <issue>2</issue>
          ),
          <fpage>135</fpage>
          -
          <lpage>160</lpage>
          ,
          <year>2010</year>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Guedria</surname>
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Naudet</surname>
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            <given-names>D.</given-names>
          </string-name>
          <article-title>maturity model for enterprise interoperability</article-title>
          .
          <source>Enterprise Information System</source>
          ,
          <volume>9</volume>
          (
          <issue>1</issue>
          ),
          <fpage>1</fpage>
          -
          <lpage>28</lpage>
          ,
          <year>2015</year>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>