=Paper=
{{Paper
|id=Vol-2416/paper47
|storemode=property
|title=Methods for finding shortest paths on graphsin organizational and economic systems and their implementation
|pdfUrl=https://ceur-ws.org/Vol-2416/paper47.pdf
|volume=Vol-2416
|authors=Vladimir Ramzaev,Irina Khaimovich,Ilya Martynov
}}
==Methods for finding shortest paths on graphsin organizational and economic systems and their implementation ==
Methods for finding shortest paths on graphs in organizational and economic systems and their implementation V М Ramzaev1, I N Khaimovich1,2 and I V Martynov2 1Samara University of Public Administration «International Market Institute», G S Aksakova Street, 21, Samara, Russia, 443030 2Samara National Research University, Moskovskoe Shosse 34A, Samara, Russia, 443086 e-mail: kovalek68@mail.ru Abstract. The article implements the functions in Postgre SQL DBMS, finding the shortest paths on graphs, using the wave algorithm method, the Dijkstra’s method and the Floyd method. The authors determined models of dependencies of the running time of implementations of the shortest-path search algorithms on graphs on the number of graph vertices experimentally. A comparison of the data obtained as a result of the study was carried out to find the best applications of implementations of the shortest path search algorithms in the Postgre SQL DBMS. 1. Introduction Modern organizational and economic systems are increasingly using technologies for collecting, analyzing and processing data [1,2], such systems in industrial enterprises include PDM - class systems. In these systems, data is arranged in undirected graphs with different weights. The main task of making decisions in PDM systems is the search for the optimal variant using graph methods [3, 4], this type of search is also actively used for analyzing social networks [5-8]. Currently, issues of the effectiveness of open source software and licenses, which practically do not impose restrictions on its use, are becoming increasingly topical. An example of such software is PostgreSQL DBMS, which by its capabilities comes close to the most common commercial Oracle database. Therefore, the study and analysis of the effectiveness of PostgreSQL DBMS is an important task [9]. In this paper, we study the theoretical foundations of constructing the shortest path search algorithms and the capabilities of the PL / PGSQL programming language built into the PostgreSQL DBMS. To implement the algorithms, databases and software have been developed for finding the shortest paths. The temporal characteristics of the algorithms on the test graphs are investigated and compared with each other. V International Conference on "Information Technology and Nanotechnology" (ITNT-2019) Data Science V М Ramzaev, I N Khaimovich and I V Martynov 2. Methods for finding shortest paths on graphs and analyzing the effectiveness of their implementation Let’s consider the wave algorithm, Dijkstra’s algorithm, and Floyd’s algorithm for finding the shortest paths on graphs. Wave algorithm (wave-tracing algorithm, Lee algorithm) is an algorithm for finding the path, the algorithm for finding the shortest path on a planar graph. It belongs to algorithms based on wide search methods. The wave algorithm in the context of finding a path through a maze was proposed by E.F. Moore. Lee independently discovered the same algorithm when formalizing the routing algorithms for printed circuit boards in 1961. In step 1 of the algorithm, the function obtains the number of the initial vertex and calculates labels for all vertices of the graph, the resulting values are written to the nodes table. In step 2 of the algorithm, the function obtains the number of the final vertex and calculates the shortest route to the initial vertex; the function outputs the result as an array with the vertex numbers. To implement the search for the shortest path by the wave algorithm method on a given graph, the data are presented in the form of a table that contains the following fields: vertex number of the graph, list of adjacent vertices, path mark (No. of wave front of the wave algorithm), previous vertex number. Here is a block diagram of the wave algorithm. To analyze the operation time of the wave algorithm, calculations were performed on test graphs generated by the PostgreSQL DBMS using the function written by the authors. Each graph is a two- dimensional rectangular table of nodes with the size of N rows and M columns. Each node of the grid (except for the nodes of the last column and the last row) is connected by an edge with three nodes lying to the right, below and diagonally to the right-down from the current node. The lengths of all edges are 1. To clarify, we present the data in the form of a graph (Figure 1). V International Conference on "Information Technology and Nanotechnology" (ITNT-2019) 369 Data Science V М Ramzaev, I N Khaimovich and I V Martynov 60 50 run time, sec. 40 30 20 10 0 0 1 2 3 4 5 6 7 8 9 10 Number of cells in the maze, thousand. Figure 1. The dependence of the wave algorithm time on the number of vertices. From the graph it is clear that the dependence is non-linear. We approximate, by a second-order polynomial, the obtained data, to find the function of the dependence of the algorithm time on the number of cells in the maze. Here is the obtained function: y = 0.47x2-0.19x+0.19. The R-squared value (determination coefficient) of the current model is 0.9472, which corresponds to a model of good quality. The next shortest path search algorithm is Dijkstra’s. This algorithm was proposed by the Dutch scientist Dijkstra in 1959. The algorithm is described in many sources [9]. Dijkstra's algorithm is applicable to an oriented weighted graph with non-negative weights [10]. In step 1 of the algorithm, the initial node S is assigned the status of a permanent label with zero value d (S). This vertex is called basic and denoted by V. In step 2, the vertices directly reachable from the base vertex V are assigned temporary label values equal to d (V) + w (V → U), where w (V → U) is the edge weight from the vertex V to the vertex U. In step 3, among the vertices marked with temporal values, the vertex U is selected with the minimum mark and assigned the status of a permanent mark. The value of the constant label will be equal to the length of the shortest path from the initial vertex S to the vertex U. In step 4, the vertex U is taken as the base vertex V and the transition to step 2 takes place. The algorithm ends when all reachable vertices are marked with permanent labels. To implement the algorithm in the PostgreSQL DBMS, it is necessary to create an Edge table in which the number of the initial vertex for each edge is stored, as well as the number of the final vertex, and the weight of the edge. To restore the path, add the Node table in which everything will be stored. To implement the algorithm, we use the PL / pgSQL [10] programming language built into the DBMS, in which besides executing SQL queries, we can use standard high-level control instructions for languages. To analyze the operation time of the Dijkstra algorithm, calculations were made on test graphs. The lengths of all the edges were chosen randomly in the range from 1 to 1,000,000. For clarity, we present the data in the form of a graph (Figure 2). The graph shows clearly that the dependence is non-linear. We approximate, by a second-order polynomial, obtained from the data, to find the function of the dependence of the running time of the algorithm on the number of cells in the maze. Here is the obtained function: y = 1.64x2-0.06x+0.05. V International Conference on "Information Technology and Nanotechnology" (ITNT-2019) 370 Data Science V М Ramzaev, I N Khaimovich and I V Martynov 200 150 Run time, sec. 100 50 0 0 1 2 3 4 5 6 7 8 9 10 Number of cells in the maze, thousand. Figure 2. The dependence of the time of the Dijkstra’s algorithm on the number of vertices. The R-squared value (determination coefficient) of the current model is 0.9141, which corresponds to a model of good quality. The next algorithm for finding the shortest paths on graphs is the Floyd’s algorithm. This algorithm is a dynamic algorithm for finding the shortest distances between all vertices of a weighted oriented graph. It was developed in 1962 by Robert Floyd and Stephen Worshall. In step 1, the algorithm calculates and writes into the ADS table of the mD matrix which is the shortest path length matrix; mS is the auxiliary matrix to restore the path. In step 2, the algorithm will calculate the optimal route from the vertex u to the vertex v of the graph. To implement the algorithm in PostgreSQL DBMS, create an Edge table in which for each edge we will store the record number, and in fact the task number; the number of vertices of the graph; adjacency matrix; shortest path length matrix; auxiliary matrix to restore the path. To implement the algorithm, we use the PL / pgSQL [10] programming language built into the DBMS, in which besides executing SQL queries, we can use standard high-level control instructions for languages. To analyze the Floyd's algorithm time, calculations were made on test graphs. The lengths of all the edges were chosen randomly in the range from 1 to 1,000,000. To clarify, we present the data in the form of a graph (Figure 3). From the graph it is clear that the dependence is non-linear. We approximate, by a second-order polynomial, obtained from the data, to find the function of the dependence of the running time of the algorithm on the number of cells in the maze. 600000 500000 run time, sec. 400000 300000 200000 100000 0 0 2000 4000 6000 8000 10000 Number of vertices, pcs Figure 3. The dependence of the Floyd algorithm time on the number of vertices. V International Conference on "Information Technology and Nanotechnology" (ITNT-2019) 371 Data Science V М Ramzaev, I N Khaimovich and I V Martynov Here is the obtained function: y=5*10-7*x3+0.0003x2-0.0735x+2.8405. The R-squared value (determination coefficient) of the current model is 0.9462, which corresponds to a model of good quality. Let us analyze the time of the wave algorithm and the Dijkstra’s algorithm on graphs with unit weights containing from one thousand to ten thousand vertices. Figure 4. Comparison of the running time of the wave algorithm and the Dijkstra’s algorithm. Let us write the equation of the dependence of the operating time of the proposed implementation of the wave algorithm on the number of vertices in the graph: y=0.47x2-0.19x+0.19. The R-squared value (determination coefficient) of the current model is 0.8534, which corresponds to a model of good quality. Let us write the equation of the dependence of the operating time of the proposed implementation of the Dijkstra’s algorithm on the number of vertices in the graph: y=1.64x2-0.06x+0.05. The R-squared value (determination coefficient) of the current model is 0.9472, which corresponds to a model of good quality. To calculate the time difference of the proposed implementations of these algorithms, we compare the coefficients at the highest powers in the obtained equations: a= 1.64/0.47=3.48 It turns out that the proposed implementation of the wave algorithm copes with the task by 3.5 times faster than the proposed implementation of the Dijkstra’s algorithm. From which it can be concluded that in not oriented graphs with unit weights it is best of all to use the wave algorithm. Let us consider a comparison of the running time of the of Floyd’s and Dijkstra’s algorithm on test graphs. Due to the fact that Floyd’s algorithm is based on finding the shortest distances between all the graph vertices, and Dijkstra’s algorithm for finding the shortest distance from one of the graph vertices V International Conference on "Information Technology and Nanotechnology" (ITNT-2019) 372 Data Science V М Ramzaev, I N Khaimovich and I V Martynov to all others, we will use the following method to compare their operation time. We will compare the time of the Floyd's algorithm and the time of the Dijkstra's algorithm multiplied by n, where n is the number of vertices of the graph. Let us present data in the form of a Figure 5. Figure 5. Comparison of the Floyd’s algorithm time and the Dijkstra’s algorithm time. Let us write the equation of the dependence of time of the proposed implementation of the Floyd’s algorithm on the number of vertices in the graph: y=5*10-7x3+0.0002x2-0.0684x+0.5874. The R-squared value (determination coefficient) of the current model is 0.9235, which corresponds to a model of good quality. Let us write the equation of the dependence of time of the proposed implementation of the Dijkstra’s algorithm on the number of vertices in the graph: y=2*10-6x3-2*10-5x2-0.0247x+47.552. The R-squared value (determination coefficient) of the current model is 0.9242, which corresponds to a model of good quality. To calculate the difference in the operation time of the proposed implementations of these algorithms, compare the coefficients at the highest powers in the obtained equations: a= 2*10-6/5*10-7=4 It turns out that the proposed implementation of the Floyd’s algorithm copes with the task of finding the shortest path from all the vertices at 4 times faster than the proposed implementation of the Dijkstra’s algorithm. From which we can conclude that in such problems it is better to use the Floyd’s algorithm. Let us compare the runtime of the Dijkstra’s algorithm implementations on the example of implementations in PostgreSQL and C ++. The result of the analysis of the running time of the implementation of the Dijkstra’s algorithm in PostgreSQL and C ++ is shown in table 1. For clarity, the test results are illustrated in the Figure. 6. V International Conference on "Information Technology and Nanotechnology" (ITNT-2019) 373 Data Science V М Ramzaev, I N Khaimovich and I V Martynov Table 1. The dependence of processing time on the number of vertices on the graph. Number of vertices/pcs. Processing time/sec. PostgreSQL C++ 0 0 0 100 0.3 0 200 0.6 0 500 1.1 0 1000 1.8 0.04 2000 5.7 0.13 5000 42.1 0.75 10000 163.5 40 10500 180.23 73 11000 197.78 115.2 12000 235.49 182.3 14000 320.65 549.9 15000 368.15 809 Graph of test results 900 800 700 600 run time, sec. 500 400 300 200 100 0 0 2000 4000 6000 8000 10000 12000 14000 16000 Number of vertices in a grath /pcs. PostgreSQL C++ Figure 6. Comparison of the operating time of the implementation of the Dijkstra’s algorithm in C ++ and PostgreSQL . 3. Analysis of obtained results From the research results we can draw the following conclusions: - the proposed implementation of the wave algorithm is best suited for solving the problems of finding the shortest path in not oriented graphs with unit weights; - the proposed implementation of the Dijkstra’s algorithm is best suited for solving problems of finding the shortest path in not oriented graphs with different weights; - the proposed implementation of the Floyd’s algorithm is best suited for solving problems of finding the shortest path in not oriented graphs with any weights, where the graph remains unchanged for a long time, for example, a graph of roads or a metro system. Of the algorithmic advantages of solving the problem using the database tools, two points are most significant: specification of the distances to unvisited adjacent vertices within a single UPDATE query and accelerated search for vertices using the indexing tools built into the database engine. The main disadvantage of this implementation is the relatively low execution speed of single queries to the database, which is noticeable when solving organizational and economic problems of V International Conference on "Information Technology and Nanotechnology" (ITNT-2019) 374 Data Science V М Ramzaev, I N Khaimovich and I V Martynov small and medium dimensions [11 - 14]. The application of data of graph search acceleration algorithms is particularly important in life cycle management systems of industrial products, such as PDM systems. In these systems, new modules have emerged related to the analysis of big data, as in industrial enterprises there are opportunities to integrate design and technological data on all parts and assembly elements of complex innovative products, such as aircraft products and products of the rocket-space complex (about 10 million parts and assembly elements in one product). For complex technological products, we create certifications for parts and working operations associated with loading photographs and measurements into the databases for quality control and destruction during the operation of these products. Such modules in PDM - systems include modules: Siemens TeamCenter 4G Big DATA, Enovia SmarTeam 4G Big Data. 4. References [1] Chemodanov D A, Esposito F, Calyam P and Sukhov A 2018 Constrained Shortest Path Scheme for Virtual Network Service Management IEEE Transactions on Network and Service Management p 17 [2] Chemodanov D A, Calyam P, Esposito F and Sukhov A 2016 General Constrained Shortest Path Approach for Virtual Path Embedding Тhe 22nd IEEE international symposium on local and metropolitan area networks (Rome, Italy) 1-7 [3] Khaimovich I N, Ramzaev V M and Chumak V G 2016 Use of big data technology in public and municipal management CEUR Workshop Proceedings 1638 864-872 [4] Khaimovich I N, Ramzaev V M and Chumak V G 2015 Challenges of data access in economic research based on Big Data technology CEUR Workshop Proceedings 1490 327-337 [5] Wang Y 2016 A branch-and-price framework for optimal virtual network embedding Computers Networks 94 318-326 [6] Chowdhury M, Rahman M and Boutaba R 2012 Vineyard: Virtual network embedding algorithms with coordinated node and link mapping IEEE ACM Trans. Netw. 20(1) 206-219 [7] Esposito F, Paola D and Matta I 2014 On Distributed Virtual Network Embedding with Guarantees IEEE ACM Trans. Netw. 24(1) 569-582 [8] Danna E, Mandal S and Singh A 2012 A practical algorithm for balancing the max-min fairness and throughput objectives in traffic engineering Proc. of IEEE INFOCOM 846-854 [9] Cormen T, Leiserson С and Rivest R 1990 Introduction to Algorithms (Cambridge) p 1091 [10] Worsley J, Drake J 2003 Practical PostgreSQL (O’Relly Media Inc.) p 640 [11] Сhumak P V, Ramzaev V M and Khaimovich I N 2015 Models for forecasting the competitive growth of enterprises due to energy modernization Studies on Russian Economic Development 26(1) 49-54 [12] Khaimovich A I, Grechnikov F V 2015 Development of the requirements template for the information support system in the context of developing new materials involving Big Data CEUR Workshop Proceedings 1490 364-375 [13] Geras'kin M I, Chkhartishvili A G 2017 Analysis of Game-Theoretic Models of an Oligopoly Market under Constrains on the Capacity and Competitiveness of Agents Automation and Remote Control 78(11) 2025-2038 [14] Kazanskiy N L, Stepanenko I S, Khaimovich A I, Kravchenko S V, Byzov E V and Moiseev M A 2016 Injectional multilens molding parameters optimization Computer Optics 40(2) 203-214 DOI: 10.18287/ 2412-6179-2016-40-2-203-214 V International Conference on "Information Technology and Nanotechnology" (ITNT-2019) 375