=Paper=
{{Paper
|id=Vol-3909/Paper_29.pdf
|storemode=property
|title=Graph Databases in Electronic Communications Network: Assessment Based on Query Execution Time
|pdfUrl=https://ceur-ws.org/Vol-3909/Paper_29.pdf
|volume=Vol-3909
|authors=Oksana Herasymenko,Anna Ivanytska
|dblpUrl=https://dblp.org/rec/conf/iti2/0001I24
}}
==Graph Databases in Electronic Communications Network: Assessment Based on Query Execution Time==
Graph databases in electronic communications network: assessment based on query execution time ⋆ Oksana Herasymenko1,*, , and Anna Ivanytska1, 1 Taras Shevchenko National University of Kyiv, 60 Volodymyrska Street, Kyiv, 01033, Ukraine Abstract Graph databases are a powerful data structure that can be applied to solve a variety of problems. They are widely used in electronic communications network due to the need for effective management of complex network structures, where traditional relational databases do not always provide sufficient performance and flexibility. There are many graph mapping algorithms. Each of them has its own characteristics and best areas of use. In this paper, Neo4j, Memgraph and ArangoDB graph databases are considered and compared for query performance in an experiment on the same data set by measuring query execution time. Also, a usability estimation is given for Neo4j, Memgraph and ArangoDB, which based on number of GitHub users, number of image downloads, support ability, deployment ability and supported programming languages. Keywords Graph databases, communications, network, Cypher query language, average query execution time, Neo4j, Memgraph, ArangoDB 1 1. Introduction Graph databases store data in the form of graphs. A graph is a collection of vertices (nodes) and edges (relations) connecting them. Graphs represent a set of entities called nodes, and a set of relations the ways in which these entities are connected with each other [1]. Graph databases appeared relatively recently. Their value lies in the ability to work effectively with data that has complex relationships. This allows building new relationships quickly, which has a significant impact on business and other organizations. They also provide high productivity when performing queries on graph structures, as they are optimized for working with relations and nodes [5]. Graph databases are widely used in various fields due to their ability to effectively model complex relationships between data. In health care, they are used to manage medical records and predict diseases, which improves the quality of medical services [2]. In business analytics, they are used to analyze the interaction between customers and companies. This allows to improve customer relationship management (CRM) and offers personalized recommendations based on graph models [3]. Also, graph databases are used in the field of machine learning to create graph-based models, which allows to improve the accuracy of forecasting and data analysis using graph neural networks (GNN) [4]. Information Technology and Implementation (IT&I-2024), November 20-21, 2024, Kyiv, Ukraine Corresponding author. These authors contributed equally. oksgerasymenko@knu.ua (O. Herasymenko); aniaiv@knu.ua (A. Ivanytska) 0000-0001-6804-2125 (O. Herasymenko); 0009-0002-6753-3533 (A. Ivanytska) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 361 CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings Graphs are inherently additive, which means we can add new kinds of connections, nodes, labels, and subgraphs to an existing structure without breaking existing queries and application functionality. Modern graph databases provide an ability to perform seamless development. In particular, the schema-free nature of the graph data model, combined with the ability to test the graph database application programming interface (API) and query language, allows to develop the application in a controlled manner. Graph databases carry out a special role in electronic communications network, since a significant part of the algorithms performing in such systems are based on graphs. Therefore, the use of graph databases for this area is ponderable or even inevitable. Applying the graph databases in electronic communications network is discussed in the next section of this paper. Importance of graph databases is unlikely to be overestimated in our time. However, these databases also have some limitations, which should be taken into account when choosing this technology to solve certain problems. One of the main limitations is scalability. Although graph databases are great at managing relationships, they can struggle with horizontal scaling, which is more easily implemented in other NoSQL databases. Another limitation is the complexity of graph algorithms and queries, which can be computationally intensive and slow down as the graph size increases. In addition, the integration of graph databases with existing data systems can be difficult, requiring significant changes in data models and application logic [5]. Also, it is worth noting that there is no standardized query language for graph databases. It depends on the particular database. This can make it difficult to migrate from one provider to another, as different query languages may differ in functionality and syntax. 2. Graph databases in electronic communications network Graph databases have also found a prominent place in electronic communications network. They are widely used to improve network management, optimize traffic and ensure communication reliability. Graph databases make it possible to model complex network topologies more effectively, analyze connections between nodes and optimize routing. For instance, paper [6] describes how graph structures are used in various network scenarios such as wireless, wired, and software-defined networks, including for network monitoring and failure prediction. Paper [7] presents a classification of recent studies using graph neural network (GNN)- based approaches to control policy optimization, including offloading strategies, routing optimization, virtual network function orchestration, and resource allocation. Social networks are a large graph itself, that require multiple graph database servers to store and manage them. Each database server hosts a graph partition for load balancing purposes. Achieving these goals requires a dynamic redistribution algorithm. The following paper [8] provides a lightweight reallocation tool that dynamically changes the partition using a small number of resources. The redistribution tool has been integrated into Hermes, developed as an extension of the open-source Neo4j graph database framework to support partitioned graph data workloads. The knowledge graph, built using this data, provides a single, semantically enriched model that can support complex, cross-referenced queries. For instance, the system can identify users affected by a network incident, analyze user behavior trends, and suggest optimizations for network performance. Integrating these diverse data models into a graph database enables a user-aware network monitoring approach that links technical data with business insights, enhancing operational efficiency and service quality for electronic communication providers. The following paper [9] describes the use of a graph database, named Nepal, designed to support the automated management of networks, especially within virtualized and software-defined yer representing different network components, from virtual functions to physical hardware. This structure allows operators to efficiently query connections and relationships between elements. 362 data routes, analyze dependencies, and assess the impact of component failures. Its temporal features - w and troubleshoot past network states, helping them accurately diagnose issues based on historical configurations. Paper [10] introduces FrauDetector, a framework for detecting fraudulent phone calls in electronic communications using graph-mining techniques. Unlike traditional classifiers, FrauDetector relies solely on electronic communication records, constructing weighted graphs to model interactions between users and remote numbers. By applying a weighted Hyperlink-Induced Topic Search (HITS) algorithm, it calculates a "trust value" for each phone number to assess fraud likelihood. Two graphs a user-phone graph (UPG) and a contact book-phone graph (CPG) capture patterns in call frequency and duration. These patterns are analyzed to distinguish fraudulent from legitimate calls. Tests with real-world data from the Whoscall app show that FrauDetector outperforms traditional methods, especially when user profile information is limited. This framework offers a promising approach for detecting fraud in telecommunication networks. 3. Problem statement Databases are a critical part of the electronic communications industry as they store and manage a vast amount of data relating to various aspects of electronic communications infrastructure, services and customers. Today's electronic communications compan intelligent, secure, and flexible data management practices driven by the increasing volume and complexity of data. Leveraging these advances allows electronic communications companies to gain a competitive advantage, improve operational efficiency, and provide innovative services to their customers. Graph databases are becoming indispensable components of electronic communications network as they are designed to store and process data with a significant level of relationships between entities and with the possibility of painless changes of these relationships. The essence of such kinds of databases implementation is that actually they are a superstructure on other types of data models, such as relational, NoSQL or others. As already mentioned above, there is no standardization in this area, which causes portability and other issues. Since the development and improvement of graph databases are still ongoing, it is important to study various aspects of their behavior, as their drawbacks identification can help to improve similar solutions and contribute to a better understanding of options and approaches to their use. In addition, such studies allow us to outline the possible problems of using one or another database as a mean of implementing a certain technological solution. Therefore, the topic of this study is relevant and requires meticulous work. This study aims to evaluate the performance of query processing for queries of different purposes in different graph databases. To be more specific, the following selections in graph databases are of particular interest and importance: • counting nodes, which meet the specified criteria; • searching for the shortest path between nodes; • to determine if there are any paths between nodes within reach in n-steps and others. The obtained evaluations can also provide some support in choosing among graph databases for a specific task. 363 4. Related works There can be found a lot of studies comparing graph databases and other types of databases. Let's briefly consider some of them. The most attention was paid to the works related to graph databases as they represent the most interest to the topic of this research. 4.1. A comparison of current graph database models In paper [11], graph database models are systematically compared. The overview includes general features, data modeling features, and support for basic graph queries. The paper presents comparison tables of such features: data storage, operation and management (availability of query language, graphical interface, API), data graph structure (simple, with attributes, hypergraph, nested graph, labeled node, with attributes, directed edges, labeled), query tools, integrity constraints. 4.2. A comparison of a graph database and a relational database Article [12] compares graph Neo4j and relational MySQL databases. They were compared on the basis of objective indicators obtained during the experiment and subjective ones. The data for the experiment was randomly generated to obtain a directed acyclic graph (DAG). After loading the data into the respective databases, information about their size was provided. MySQL took up less memory in almost all cases. The time needed for database creation was not considered. The experiment aimed to measure an execution time of such requests as: counting nodes of a graph with a depth of 4 and 128, counting the number of nodes that had a node parameter of a certain value. Queries were made on a database with a different number of nodes (1000, 5000, 10000, 100000), which allowed us to get an insight into how the selected systems scaled. It should be noted that the experiment was conducted on numerical and linear parameters. A subjective comparison was made on such characteristics as: maturity, level of support, ease of programming, flexibility and security. As a result, Neo4j outperformed MySQL in graph traversal queries by several times. In queries that did not require traversal, MySQL was faster for integer processing and Neo4j for string processing. 4.3. An empirical comparison of graph databases Article [13] presents GDB (Graph Database Benchmark), a distributed benchmarking platform with open-source Java code to test and compare different graph databases. Comparison results of the following graph databases are also given: Neo4j, OrientDB, TitanBerkeleyDB, Titan-Cassandra and DEX. As a result of the experiment, Neo4j got the best results with workaround workloads. Neo4j, DEX, Titan-BerkeleyDB, and OrientDB all achieved similar performance on intensive read-only workloads, but for read-write workloads, the performance of Neo4j, Titan-BerkeleyDB, and OrientDB drops dramatically. Paper [14] describes a comprehensive experimental comparison of several graph database management systems (GDBs). The experiment aimed to evaluate and compare the performance of selected GDBs using a benchmarking tool called BlueBench, developed specifically for this purpose. Databases Neo4j, DEX, InfiniteGraph, OrientDB, Titan, TinkerGraph and others had been explored during the experiment. 364 The key performance indicators that had been taken into account during the experiment were execution time, scalability, transaction support, and query efficiency. The benchmarking involved running a series of predefined queries and operations on each database, measuring the tasks executing time and databases large-scale data handling. Tests were automated and aimed to be as consistent as possible across different systems to ensure fair comparison. The results showed significant variations in between the different GDBs. Neo4j, DEX and TinkerGraph performed well in many areas among other graph databases. 5. Current study database assortment Of particular interest in this study are those graph databases that are free, open-source and have use cases related with networks, traffic management and so on. Therefore, attention is focused on the following databases: Neo4j, Memgraph and ArangoDB. Let us provide some other arguments for this choice. Neo4j is considered one of the most popular graph databases, it can be installed on the machine locally. This database has a lot of study materials, tutorials and examples so it is relatively easy to learn how to use it. Memgraph is compatible with Neo4j and also uses Cypher language to write queries so it is easy to rewrite queries to use in this database. It is well-documented, and has libraries for various programming languages (at least twelve according documentation). This database also has good reviews and tutorials. ArangoDB supports a lot of data models and, according to its documentation, can be a great fit for use cases like network operations, traffic management, collect IoT data etc. 5.1. Neo4j Neo4j [15] is one of the leading open-source graph databases. It was developed in 2007 and is a Java- based No-SQL database. It is described as «high-speed graph database with unbounded scale, security, and data integrity for mission-critical intelligent applications» [16]. Key features of Neo4j: • Cypher, a SQL-like query language, is used for queries; • LPG (Labeled Property Graph) in Neo4j is a graph data model that represents data using nodes, relationships, and properties; • the scheme is optional when deploying a data set; • ACID compliant. Neo4j database integrity is based on atomicity, consistency, isolation, and durability; • supports data export in *.json and Excel formats. Neo4j Desktop is a local environment for developing and managing graph databases on the Neo4j platform. Neo4j Desktop version 4.28.0 was installed and DBMS of version 5.20.0 enterprise were used to conduct an experiment, which is described in the following section of this paper. It is important to mention that APOC 5.20.0 plugging was used to get *.csv file from *.dump file. 5.2. Memgraph Memgraph is a graph database designed for real-time processing and analysis of large volumes of data. It was developed and presented in 2016. It is noted that «the fact that Memgraph is written in C++ and resides in memory means that it is much faster than anything we have seen on the market» [17]. Key features of Memgraph: 365 • supports Cypher query language; • supports ACID in-memory transactions for fast access, storing all records in persistent memory; • ensures high performance of both transactional and analytical queries, even in highly competitive environments; • supports data export in *.csv and *.json formats. Memgraph was deployed locally in a container using Docker version 4.33.0 for the current study. The following image for the container was used: memgraph/ memgraph-platform that included memgraph/memgraph-mage version latest and memgraph/lab version latest. 5.3. ArangoDB ArangoDB is a native multi-model database that supports graph, document, and key-value data models. It was initially released in 2011 and is implemented in C++ as a NoSQL database. ArangoDB allows to create a complex application with various data models within a single backend. Key features of ArangoDB: • supports AQL (ArangoDB Query Language), which is similar to SQL and designed to handle both structured and unstructured data; • provides multi-model capabilities, allowing the use of graph, document, and key-value stores in one engine; • ACID compliance; • Framework (RDF); • features built-in clustering and horizontal scalability, facilitating large-scale distributed deployments; • supports data export in formats like *.json and *.csv. ArangoDB can be run in various environments, including local installations, containers and cloud platforms. The following image was used to run ArangoDB locally on Docker. 6. Experiment setup To compare graph databases an objective and subjective indicators were used. The objective ones rest on comparing results of performing queries speed; subjective ones estimate the level of support and ability of use in different environments. 6.1. Dataset description The Neo4j dataset of network management [18] was used for the experiment. The dataset consists of 17 unique labels that form 18 different nodes and are interconnected by 12 unique relationship types. This forms approximately 83847 nodes and 181995 relationships. It includes node types such as DataCenter, Router, Interface, Port, Rack, Switch, Version, Software, etc. Figure 1 shows the scheme of this dataset. 366 Figure 1: A scheme of network management dataset provided by Neo4j Table 1 Number of nodes and relationships in graphs used as datasets for the experiment Number of Number of Comments nodes relationships Approximately 10 times less than 10less 8474 12970 full (in number of nodes) Approximately 5 times less than 5less 16854 25933 full (in number of nodes) full 83847 181995 This dataset was taken in three different sizes for the experiment. The largest one contains all nodes and relationships. The middle one has approximately five times less elements of the nodes which aggregate the most data. The smallest one has ten times less data than the largest dataset. The exact number of nodes and relationships is shown in Table 1. Performing same queries on graphs of different sizes and comparing obtained results allows us to evaluate scaling ability of databases. 6.2. Experiment conditions The experiment was carried out using a computer running on the Windows 10 Pro operating system. The machine has 8 GB of RAM and an Intel Core i3-6100 processor operating at a frequency of 3.70 GHz. Graph databases were launched alternately on the machine during the experiment. Each query was run ten times. Two maximum and minimum values were discarded and the average of the rest values was taken as the final result. 6.3. Queries Ten different queries were used in databases' evaluation. They were grouped by similarity into three groups (Figure 2). 367 a) b) c) Figure 2: Queries for the experiment by groups: a) nodes counting first group, b) searching for the shortest path second group, c) number of paths counting third group It is worth to be noted that each database has a slightly different syntax in writing queries. Figure 2 shows queries for Neo4j. Memgraph queries syntax is very similar to Neo4j but ArangoDB has its own query language AQL. Figure 2-a depicts two queries that count how many nodes meeting given criteria there are in a database. It is important to note that for ArangoDB it was possible to write queries only of the first group due to its nature. Figure 2-b depicts three queries which search for the shortest path from one node type to another. Figure 2-c shows six queries which determine how many paths there are in exactly n-hops, where n is in the range from 1 to 6. It is also important to note that some combinations of nodes have too many variations of paths for Memgraph to find. Considering this fact, nodes with smaller number of paths were taken for final queries. 7. Results and discussion This section of the paper outlines obtained results in figures and graphical representation. Query execution time is represented in milliseconds. Section is divided into 4 subsections, three of them describe measurements of a separate group of queries specified above (subsection 7.1-7.3), and the last one introduces an estimation of the considered databases for the availability of support and the possibility of their use in various environments, which is an important criterion in choosing implementation tools for many projects. 7.1. First group of queries measurement results First group of queries aims to count number of nodes which meet given criteria. Average query execution time in milliseconds is shown in Table 2, and Figure 3 depicts the dependence of the average query execution time on the size of the graph. 368 Table 2 Average query execution time for the first group of queries, in milliseconds DB type neo4j neo4j memg memg arango arango & query # #1.1 #1.2 #1.1 #1.2 #1.1 #1.2 DB size 10less 3,5 3 2 2 1,7 0,7 5less 4 2,3 5 4,7 3 1 full 12 7 19 15 11 3,5 neo4j #1.1 memgraph #1.1 neo4j #1.2 memgraph #1.2 20 arangodb #1.1 20 arangodb #1.2 15 15 TIME (MS) TIME (MS) 10 10 5 5 0 0 10LESS 5LESS FULL 10LESS 5LESS FULL GRAPH SIZE GRAPH SIZE a) b) Figure 3: Dependency of the average query execution time on the graph size for query #1.1 a), query #1.2 b) As can be seen, all databases showed similar results but Memgraph scaled slightly worse than other databases and ArangoDB showed the best results. 7.2. Second group of queries measurement results Second group of queries aims to search for the shortest path from one node type to another. Average query execution time in milliseconds is shown in Table 3, and Figure 4 shows the dependence of the average query execution time on the size of the graph. As can be seen, for this group of queries Neo4j showed better results, especially for the graph with a large number of nodes. For instance, for the largest considered graph, the average query execution time of query #2.1 in Memgraph is four times higher than in Neo4j, and for graphs with the number of nodes 16854 and 83847, it is near 2 and near 3 times higher respectively. The same pattern is observed with the other two queries. This indicates that on large graphs Memgraph works more slowly with this type of queries. Table 3 Average query execution time for the second group of queries, in milliseconds DB type neo4j neo4j neo4j memg memg memg & query # #2.1 #2.2 #2.3 #2.1 #2.2 #2.3 DB size 10less 9 10 37 14 37 112 5less 11 13 66 25 73 240 full 27 55 244 112 420 1480 369 neo4j #2.1 memgraph #2.1 neo4j #2.2 2000 neo4j #2.3 120 100 memgraph #2.2 TIME (MS) TIME (MS) 80 memgraph #2.3 1000 60 40 20 0 0 10LESS 5LESS FULL 10LESS 5LESS FULL GRAPH SIZE GRAPH SIZE a) b) Figure 4: Dependency of the average query execution time on the graph size for query #2.1 a), queries #2.2 and #2.3 b) It is important to note that the juxtaposition of the line graphs for the two other queries (queries #2.2 and #2.3) on the same scale is intended to show that both Memgraph queries (line graphs in shades of red) take longer to execute than both Neo4j queries (line graphs in shades of blue). 7.3. Third group of queries measurement results Third group of queries aims to count how many paths there are in exactly n-hops from the certain node. Average query execution time in milliseconds is shown in Table 4, and Figure 5 depicts the dependence of the average query execution time on the size of the graph. For this group of queries Neo4j and Memgraph showed similar results for queries #3.1, #3.3, #3.4 and #3.5. Table 4 Average query execution time for the third group of queries, in milliseconds DB type neo4j neo4j neo4j neo4j neo4j memg memg memg memg memg & query # #3.1 #3.2 #3.3 #3.4 #3.5 #3.1 #3.2 #3.3 #3.4 #3.5 DB size 10less 4 8 20 94 18000 2 7 21 101 1010 5less 5 14 40 328 2857 6 15 43 311 3650 full 13 4398 270 4830 78280 20 228 222 5630 88800 For queries #3.1 and #3.5 Neo4j performed worse on the smallest graph size and better on the biggest. For the query #3.2 Neo4j performed much worse than Memgraph on the largest graph size. To sum up, for the third group of queries, the performance of the considered databases in most cases was comparable, and the presence of fluctuations in some of them (query #3.2 on the full graph and query #3.5 on the 10less graph) requires further research. 7.4. Databases usability estimation Faced with some difficulties while using the considered databases, it was decided to provide their usability estimation, taking into account various aspects of their practical application, support of programming languages, etc. Let us list the main criteria taken into account for aggregated indicator which is called usability factor: • number of image downloads from the DockerHub platform. It reflects the demand and the prevalence of the database; 370 • number of users on the GitHub platform. It shows how many users are interested in the database improvement and may put an effort into its development; • availability of user support from the database web page: value 1 if it is available on the web page and value 0 in another case; • presence of an active community. Active community helps ameliorate the database which can be expressed in adding new features or drivers to support third-party tools, detection and elimination of vulnerabilities and errors. It is presented by a value 1 if an active community exists and a value 0 in another case; • existence of images to deploy in the container. Containerization allows users to rapidly deploy applications and efficiently manage infrastructure. It is presented by a value 1 if it exists possibility of container deployment and value 0 in another case; • availability to deploy on different cloud platforms, in particular, AWS, GCP and Azure are taken into account. More and more organizations hold their infrastructure in the cloud, which is why the possibility of database deployment in the cloud is significant. It is represented by a value from 0 to 3 according to the number of specified cloud platforms and the corresponding availability of instructions for deployment on them on the database web page; • number of supported programming languages. Official libraries were taken into account. neo4j #3.1 memgraph #3.1 neo4j #3.2 memgraph #3.2 25 5000 20 4000 TIME (MS) TIME (MS) 15 3000 10 2000 5 1000 0 0 10LESS 5LESS FULL 10LESS 5LESS FULL GRAPH SIZE GRAPH SIZE a) b) neo4j #3.4 memgraph #3.4 neo4j #3.3 memgraph #3.3 6000 300 4000 TIM (MS) TIME (MS) 200 2000 100 0 0 10LESS 5LESS FULL 10LESS 5LESS FULL GRAPH SIZE GRAPH SIZE c) d) neo4j #3.5 memgraph #3.5 100000 TIME (MS) 50000 0 10LESS 5LESS FULL e) GRAPH SIZE Figure 5: Dependency of the average query execution time on the graph size for query #3.1 a), query #3.2 b), query #3.3 c), query #3.4 d), query #3.5 e) 371 Table 5 Data for usability factor calculation Neo4j Memgraph ArangoDB Years on the market 17 8 13 Number of image downloads 100M 100K 10M Number of GitHub users 704 321 111 User support 1 1 0 Active community 1 1 1 Existence of images to deploy in container 1 1 1 Deployment on different cloud platform (AWS, 3 3 2 GCP, Azure) Number of supported programming languages 7 12 5 Usability factor 0.92 0.79 0.42 Table 5 contains data on the specified criteria for each of the databases. The data was collected from official online resources of databases, such as official web pages, GitHub repositories, etc. It is also worth noting that the table contains the number of years the database has been on the market, which is required for further calculation of the aggregated factor. A following equation (1) was used to calculate usability factor 𝑎𝑖 𝑏𝑖 1 𝑦𝑖 1 𝑦𝑖 1 𝑠𝑖 + 𝑚𝑖 1 1 1 𝑝𝑖 1 𝑙𝑖 𝐹𝑖 = ( ∙ 𝑎 + ∙ + ∙ + ∙ ( 𝑐𝑖 + ∙ ) + ∙ ), (1) 5 ( ) 5 (𝑏) 5 2 5 2 2 3 5 𝑙𝑚𝑎𝑥 𝑦 𝑚𝑎𝑥 𝑦 𝑚𝑎𝑥 where ai number of image downloads of i-th database; yi number of years on the market of i-th database; bi number of GitHub users of i-th database; si point of i-th database user support existence; mi point of active community i-th database existence; ci point of i-th database image existence to container deployment; pi number of i-th database supported cloud platforms; li 𝑎 𝑎 number of i-th database supported programming languages; (𝑦) maximum value among all 𝑦𝑖; 𝑚𝑎𝑥 𝑖 𝑏 𝑏 (𝑦) maximum value among all 𝑦𝑖 ; lmax maximum value among all li. 𝑚𝑎𝑥 𝑖 Obtained usability factors for considered databases are in Table 5. Figure 6 shows the contribution of each added to the overall usability factor value. It is important to note, that the third addend cumulates user support and active community points (Support in Figure 6) and the fourth addend cumulates database image existence to container deployment and normalized number of supported cloud platforms in equal parts (Deployment ability in Figure 6). 1,1 Number of image 1 downloads 0,9 Number of 0,8 GitHub users 0,7 Support 0,6 0,5 0,4 Deployment 0,3 ability 0,2 Programming 0,1 languages 0 Usability factor Neo4j Memgraph ArangoDB Figure 6: Contribution of each component to the overall usability factor value for Neo4j, Memgraph and ArangoDB databases 372 As can be seen, Neo4j has an advantage in most positions, which is caused on the one hand by a large number of supported features, and on the other hand by its widespread use among users. 8. Conclusions During this study, three graph databases of various graph sizes were assessed through query execution time of 3 query groups. Experiments were conducted on the network management dataset and show that Neo4j demonstrates 2-4 times better performance for most queries compared to Memgraf. At the same time, its performance compared to ArangoDB was somewhat worse in the first group of queries. It was not possible to write the other two groups of queries in ArangoDB due to the peculiarities of its nature and the built-in query writing language. When performing the second group of queries, Neo4j showed significantly better results compared to Memgraph, especially for larger graphs, for instance, the average query #2.1 execution time in Memgraph is from 2 to 4 times higher than in Neo4j depending on the graph size. For the third group of queries, the results of Memgraph and Neo4j are comparable in almost all cases. However, it should be noted that the performance of Neo4j is slightly worse on the queries of the first and third groups when working with a graph of the smallest size. This may be due to the peculiarities of the internal implementation of Neo4j, as it is a NoSQL database by its nature. more detail. Hence, they were estimated by the usability factor and Neo4j has the highest score 0.92. From the subjective point of view, it turned out, that Neo4j is the easiest database to work with. In addition to query complications, importing data into ArrangoDB requires splitting the file separately into nodes by node types and into relationships by their types, and running created files one by one distinctly. It slows down the import process and makes it confusing. Declaration on Generative AI The authors have not employed any Generative AI tools. References [1] Robinson, Ian, Jim Webber, and Emil Eifrem. Graph databases: new opportunities for connected data. " O'Reilly Media, Inc.", 2015. [2] Cui, Hejie, et al. A Review on Knowledge Graphs for Healthcare: Resources, Applications, and Promises, 2023, doi:10.48550/arXiv.2306.04802. [3] Henna, Shagufta, and Shyam Krishnan Kalliadan. Enterprise Analytics using Graph Database and Graph-based Deep Learning, 2021, doi:10.48550/arXiv.2108.02867. [4] LI, Harry, et al. Knowledge graphs in practice: characterizing their users, challenges, and visualization opportunities. IEEE Transactions on Visualization and Computer Graphics, 2023, doi:10.1109/TVCG.2023.3326904. [5] Besta, Maciej, et al. Demystifying graph databases: Analysis and taxonomy of data organization, system designs, and graph queries. ACM Computing Surveys 56.2, 2023, pp. 584 594. doi:10.1145/3604932. [6] Jiang, Weiwei. Graph-based deep learning for communication networks: A survey. Computer Communications, 2022, doi:10.1016/j.comcom.2021.12.015. [7] Tam, Prohim, et al. Graph neural networks for intelligent modelling in network management and orchestration: a survey on communications. Electronics, 2022, doi:10.3390/electronics11203371. [8] Nicoara, Daniel, et al. Hermes: Dynamic Partitioning for Distributed Social Network Graph Databases. EDBT. 2015. 373 [9] Jamkhedkar, Pramod, et al. A graph database for a virtualized network infrastructure. Proceedings of the 2018 International Conference on Management of Data. 2018, doi.org/10.1145/3183713.3190653. [10] Tseng, Vincent S., et al. Fraudetector: A graph-mining-based framework for fraudulent phone call detection. Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2015, doi.org/10.1145/2783258.2788623. [11] Angles, Renzo. A comparison of current graph database models. 2012 IEEE 28th International Conference on Data Engineering Workshops. IEEE, 2012, doi:10.1109/icdew.2012.31. [12] Vicknair, Chad, et al. A comparison of a graph database and a relational database: a data provenance perspective. Proceedings of the 48th annual Southeast regional conference. 2010, doi:10.1145/1900008.1900067. [13] Jouili, Salim, and Valentin Vansteenberghe. An empirical comparison of graph databases. 2013 International Conference on Social Computing. IEEE, 2013, doi:10.1109/SocialCom.2013.106. [14] comparison of graph databases. Proceedings of International Conference on Information Integration and Web-based Applications & Services. 2013, doi:10.1145/2539150.25391. [15] Neo4j Closes Banner Year Marked by Customer Successes, Continued Industry Validation, Community Engagement, and Major Funding. URL: https://neo4j.com/pressreleases/2021- company-momentum/. [16] Neo4j database. URL: https://neo4j.com/product/neo4j-graphdatabase/. [17] Memgraph database. URL: https://memgraph.com/memgraphdb. [18] Neo4j Network Management dataset on github. URL: https://github.com/neo4j- graphexamples/network-management. 374