GooDBye: a Good Graph Database Benchmark - an Industry Experience Piotr Matyjaszczyk Przemyslaw Rosowski Robert Wrembel Poznan University of Technology Poznan University of Technology Poznan University of Technology Poland Poland Poland piotrmk1@gmail.com przemyslaw.rosowski@student.put. robert.wrembel@cs.put.poznan.pl poznan.pl ABSTRACT analyzed there. For this reason, a fundamental issue was to choose This paper reports a use-case developed for an international IT a GDBMS that would be the most suitable for particular ’graph company, whose one of multiple branches is located in Poland. In shapes’ and queries needed by the company. The assessment order to deploy a graph database in their IT architecture, the com- criteria included: (1) performance characteristics w.r.t. variable pany needed an assessment of some of the most popular graph number of nodes in a cluster as well as (2) functionality and user database management systems to select one that fits their needs. experience. Despite the fact that multiple graph database benchmarks have The specific structures of graphs produced by the company been proposed so far, they do not cover all use-cases required and specific queries have not matched what was offered by the ex- by industry. This problem was faced by the company. A specific isting GDB benchmarks. These facts motivated the development structure of graphs used by the company and specific queries, of the GoodBye benchmark, presented in this paper. Designing initiated developing a new graph benchmark, tailored to their GooDBye was inspired by [19] and it complements the existing needs. With this respect, the benchmark that we developed com- graph database benchmarks mentioned before. The benchmark plements the existing benchmarks with 5 real use-cases. Based contributes real business use-cases. on the benchmark, 5 open-source graph database management The paper is structured as follows. Section 2 overviews bench- systems were evaluated experimentally. In this paper we present marks developed by research and industrial communities. Section the benchmark and the experimental results. 3 presents the benchmark that we developed. Section 4 outlines our test environment. Section 5 discusses the experimental eval- uation of GDBs and their results. Finally, Section 6 summarizes 1 INTRODUCTION the paper. Among multiple database technologies [26], for a few years graph databases (GDBs) have gained their popularity for storing and 2 RELATED WORK processing interconnected Big Data. In the time of writing this The performance of a database management system is typically paper there existed 29 recognized graph database management assessed by means of benchmarks. Each domain of database appli- systems (GDBMSs), cf., [9], offering different functionality, query cation incurs its own benchmark. A benchmark is characterized languages, and performance. by a given schema (structure of data) and different workload When it comes to selecting a GDB to suit an efficient storage characteristics (query and data manipulation), and often by per- of given graphs and efficient processing, a company professional formance measures. Database benchmarking over years have has to either implement multiple ’proofs of a concept’ or rely on received a substantial attention from the industry and research existing evaluations of various databases. Typically, important communities. Nowadays, the standard industry approved set of assessment metrics include: (1) performance and (2) scalability benchmarks for testing relational databases is being offered by the w.r.t. a graph size and (3) scalability w.r.t. a number of nodes in a Transaction Processing Council (TPC) [31]. They support 2 main cluster. classes of benchmarks, namely: (1) TPC-C and TPC-E - for test- In practice, assessing performance of IT architectures and ing the performance of databases applied to on-line transaction particular software products is done by a benchmark. There exist processing, (2) TPC-H - for testing the performance of databases multiple dedicated benchmarks for given domains of application. applied to decision support systems. Special benchmarks were In the area of information systems and databases, the industry proposed for testing the performance of data warehouses (e.g., accepted and used benchmarks are developed by the Transaction [8, 15, 20, 27]). Processing Council. There also exist dedicated benchmarks for [17] overviews the existing cloud benchmarks with a focus non-relational databases and clouds, cf. Section 2. on cloud database performance testing. The author argues about Although there exist multiple benchmarks designed for graph adopting TPC benchmarks to a cloud architecture. [14] proposes databases, the motivation for our work came as a real need from a DBaaS benchmark with typical OLTP, DSS, and mixed work- industry, i.e., a large international IT company (whose name loads. [7] compares a traditional open-source RDBMS and HBase cannot to be revealed), having one of its multiple divisions located a distributed cloud database. [25] and [4] show the performance in Poland. The company stores large data volumes on various results of relational database systems running on top of virtual configurations of their software and network infrastructures. machines. [30] presents a high-level overview of TPC-V, a bench- These data by virtue are interconnected and naturally form large mark designed for database workloads running in virtualized graphs. Currently, these graphs are stored in flat files but in environments. the future, they will be imported into a proprietary GDB and Benchmarking of other types of databases, like XML (e.g., © Copyright 2020 for this paper held by its author(s). Published in the Workshop [23, 29]), RDF-based (e.g., [16]), NoSQL, and graph, received less Proceedings of the EDBT/ICDT 2020 Joint Conference (March 30-April 2, 2020, Copenhagen, Denmark) on CEUR-WS.org. Use permitted under Creative Commons interest from the research and technology communities in the License Attribution 4.0 International (CC BY 4.0) past. However, with the widespread of Big Data technologies, testing performance of various NoSQL data storage systems be- (4) load the data into a GDB, using its proprietary tool, came a very important research and technological issue. In this (5) turn off database’s caching mechanisms, as the same subset context, [6] proposed Yahoo! Cloud Serving Benchmark (YCSB) of queries will need to be repeated multiple times, to compare different key-value and cloud storage systems. [28] (6) run queries on the GDB. proposed a set of BigTable oriented extensions known as YCSB++. In the area of GDBs, several benchmarks have been proposed 3.1 Graph data so far. [3] advocated for using a large parameterized weighted, di- A graph used in the benchmark is directed and cyclic, with a max- rected multigraph and irregular memory access patterns. In [10] imum cycle length of 2. The graph reflects typical software and the authors discussed characteristics of graphs to be included hardware configurations in a large company. A node represents in a benchmark, characteristics of queries that are important one of the three following data entities: in graph analysis applications, and an evaluation workbench. • a package - it is composed of objects; a package can be In the same spirit, problems of benchmarking GDBs were dis- transformed into another package; all packages have the cussed in [5]. The authors explained how graph databases are same structure (fields); constructed, where and how they can be used, as well as how • an object - it is composed of fields; an object can be trans- benchmarks should be constructed. Their most important con- formed into another object, similarly as a package; clusions were that: (1) an increase in the size of a graph in most • a field - a field can be transformed into another field, simi- graph databases leads only to a linear increase of an execution larly as an object, all fields have the same simple elemen- time for highly centralized queries, (2) the same cannot be said tary datatype. for distributed queries, and (3) an important factor controlling throughput of highly distributed queries is the size of memory An arc represents: cache, and whether an entire graph structure can be fit in mem- • a data transformation - packages can be transformed into ory. other packages, objects into other objects, and fields into [1] described the so-called SynthBenchmark, which is included other fields; each transformation (identified by its ID) is in the Spark GraphX library. It also offers a small graph gener- represented at all three levels of data entities; ator. [2, 13] outlined a Java-based benchmark for testing social • a data composition - each package contains one or more networks. Its data were stored in MySQL. The benchmark al- objects, and each object contains one or more fields. lowed to generate a graph of 1 billion of nodes, with its statistical The data generator is parameterized and can produce graphs properties similar to the one of Facebook. described by different statistics. For the benchmark application [12, 22] proposed the Social Network Benchmark, focusing presented in this paper, the graph had the following statistics: on graph generation and 3 different workloads, i.e., interactive, • the number of vertices: 911034, which represented 500 Business Intelligence, and graph algorithms. packages; [19] suggested and implemented a benchmark for a GDBMS • the number of arcs: 3229158, working in a distributed environment. The authors attempted to • the average number of objects in a package: 100 (binomial create a holistic benchmark and - using the Tinkerpop stack - run distribution n=8000, p=0.0125), it on a series of the most popular graph databases at that time, • the number of object categories (types): 2; 30% of objects including Neo4j, OrientDB, TitanDB, and DEX. [11] evaluated the belong to category A, 70% belong to category B, performance of four GDBs, i.e, Neo4j, Jena, HypergraphDB, and • the average number of fields of objects in category A: 30 DEX with respect to a graph size, using typical graph operations. (binomial distribution n=1500, p=0.02), [24] focused on benchmarking 12 GDBs, i.e., Neo4j, OrientDB, • the average number of fields of objects in category B: 8 InfoGrid, TitanDB, FlockDB, ArangoDB, InfiniteGraph, Allegro- (binominal distribution n=400, p=0.02), Graph, DEX, GraphBase, HyperGraphDB, Bagel, Hama, Giraph, • the average number of incoming fields transformation PEGASUS, Faunus, NetworkX, Gephi, MTGL, Boost, uRiKA, and arcs: 2.5 (binominal n=80, p=0.03125), STINGER. This work tests performance of the majority of the • 4% of arcs form single-arc cycles, GDBs but only in a centralized environment. • 2% of arcs form two-arc cycles. [18, 21] described a benchmark developed in the co-operation with 4 IT corporations and 4 universities. The benchmark con- 3.2 Queries sists of six algorithms: Breadth-first Search, PageRank, Weakly Connected Components, Community Detection using Label Prop- Eight queries were defined and implemented in the benchmark, agation, Local Clustering Coefficient, and Single-source Shortest as required by the company. Queries Q1-Q5 (described below) Paths. The data part includes real and synthetic datasets. were demanded by the company. Q1-Q3 aim at checking how long it takes for a GDB to find neighbor vertices, as well as 3 OUR APPROACH: GOODBYE - A GOOD navigating via incoming and outcoming arcs. Q4 and Q5 check how fast a GDB finds nodes, having given in- and out-going arcs, GRAPH DATABASE BECHMARK to calculate an impact of changes on transformations. Q6-Q8 are The GooDBye benchmark includes: (1) a parameterized graph typical queries that are defined in other benchmarks. data generator, (2) a graph database, and (3) queries that are to • Q1 - it finds and returns all vertices that transform to node be run on it. In order to use the benchmark, a user needs to: of type A, i.e., nodes that have an outgoing arc of type (1) run a data generator, Transformation that is an incoming arc to A. (2) decide which GDBMS is to be tested, and install it on a • Q2 - it finds and returns all nodes that have an incoming cluster, arc of type Transformation, whose source is A. (3) transform data generated by the benchmark into a form • Q3 - it counts all the vertices that are connected to a node readable by the selected GDBMS, by the Transformation arc lading to the node of type A and computes the percentage of these nodes over the number 5.1 Results of all vertices in the graph. The response time (elapsed) for Q1-Q8 was measured in mil- • Q4 - it counts all direct neighbors of nodes connected by liseconds. Below we present the results and discuss the obtained a given transformation type and returns the percentage performance characteristics. of the entire graph they comprise. Q1 - transformation sources for a given node. The re- • Q5 - it counts all direct nodes connected by a given trans- sponse times for Q1 are shown in Figure 1. As we can observe, formation type, including nodes adjacent to nodes of type ArangoDB clearly outperforms all the other GDBMSs. GraphX is A. the only system for which query response time decreases with • Q6 - it counts the number of incoming and outgoing arcs the increasing number of nodes. The performance of TitanDB of every single node in the graph, and returns a total count and JanusGraph degrades with the increasing number of nodes for each of them. This models a degree calculation for the - queries run on the 9-machine cluster take about twenty times entire graphs. more than when running on a single node. • Q7 - it returns all nodes in the database that have their attribute equal to given number. 1x107 • Q8 - it computes the shortest path between two nodes. ArangoDB 1x106 OrientDB TitanDB JanusGraph 4 TEST ENVIRONMENT 100000 response time [ms] GraphX 4.1 GBDMSs under test 10000 The company imposed the following requirements on a GDBMS: 1000 • to be run on either an open-source or academic licence, 100 • to be used in practice by industry (listed on the DB-Engines 10 website [9]), 1 • to support at least 3 of the ACID properties, 1n 3n 5n 9n • to be capable of running in a cluster of workstations. #nodes Based on the aforementioned criteria, out of 29 available GDBMSs, the following were selected for evaluation: ArangoDB, TitanDB, Figure 1: Execution time of Q1 JanusGraph, OrientDB, and Spark GraphX. One system we heav- ily considered using was Neo4j. According to DB-Engines ranking, Q2 - listing transformation destinations for a given node. at the time of writing this paper, it was the most popular GDBMS. The response times of this query are shown in Figure 2. In this Unluckily, we were unable to obtain an academic license of its full, test, ArangoDB, once again outperforms all the other GDBMSs, ’enterprise’ edition, providing among others distributed storage although the degradation of its performance when increasing the and processing. As such, we have decided not to test it, rather size of the cluster is more noticeable. GraphX execution times than unfairly assess its toy version. decrease by a factor of three, when a cluster size increases to 9, resulting in better execution times than TitanDB or JanusGraph, 4.2 Benchmark setup but still worse than OrientDB. The GDBMSs were installed in a micro-cluster composed of 9 physical machines. Each node was run by Ubuntu and had the 1x107 ArangoDB following parameters: (1) 8GB RAM, (2) 457GB HDD, (3) Intel 1x106 OrientDB Core2 Quad CPU Q9650 3.00GHz, (4) graphic card: GT215. The TitanDB JanusGraph 100000 machines were physically fully interconnected with each other, response time [ms] GraphX enabling direct communication whenever required. The logical 10000 connections depended on the database system used. 1000 Depending on an experiment, there were 1, 3, 5, or 9 nodes 100 used at any given time. Such cluster sizes allowed to use with 1, 2, 4, and 8 worker nodes, and 1 access/coordinator node. Data 10 were partitioned as equally as possible between nodes, using data 1 distribution mechanisms provided by each GDMS. 1n 3n 5n 9n #nodes 5 PERFORMANCE EVALUATION OF SELECTED GRAPH DATABASES Figure 2: Execution time of Q2 The goal of the experiments was to evaluate response time of the 8 queries outlined in Section 3.2, for the 5 GDBMSs under Q3 - measuring the impact of changes in a node. The test. Each query was run twelve times on the same dataset, on response times of this query are shown in Figure 3. From the chart 1, 3, 5, and 9 nodes. The highest and lowest measurements were we can notice that ArangoDB has a clear lead over its competitors discarded. An average value and standard error were calculated yet again, with its executions taking thousands times less than for the remaining measurements. Due to huge differences in of the other GDBMSs. GraphX slowly approaches ArangoDB performance between the tested GBDMSs, a logarithmic scale execution times as the cluster size increases. OrientDB achieves was used in charts. Below we discuss the obtained performance better results than TitanDB and JanusGraph on clusters of greater characteristics. size. 1x107 GraphiX offers the best performance, regardless of the cluster ArangoDB 1x106 OrientDB size. In a 3-, 5-, and 9-node cluster ArangoDB performs the worst. TitanDB JanusGraph The performance of OrientDB remains unchanged regardless of 100000 the cluster size. response time [ms] GraphX 10000 1000 1x107 ArangoDB OrientDB 100 1x106 TitanDB JanusGraph 10 100000 response time [ms] GraphX 1 10000 1n 3n 5n 9n 1000 #nodes 100 10 Figure 3: Execution time of Q3 1 1n 3n 5n 9n Q4 - measuring the impact of changes in transforma- #nodes tion. As Q4 query is very similar to Q3, the execution times shown in Figure 4 have characteristics similar to those shown in Figure 3, i.e., ArangoDB achieves the best results, with GraphX Figure 6: Execution time of Q6 slowly decreasing ArangoDB lead as the cluster grows. Q7 - filtering query. The results of this evaluation are shown 1x107 in Figure 7. On a single node, average execution times of the same ArangoDB 1x106 OrientDB query on ArangoDB and GraphX differ only by forty-five millisec- TitanDB JanusGraph onds, but as the cluster size increases GraphX gains noticeable 100000 lead over all the other GDBMSs. response time [ms] GraphX 10000 1000 1x107 ArangoDB 100 1x106 OrientDB TitanDB JanusGraph 10 100000 response time [ms] GraphX 1 10000 1n 3n 5n 9n 1000 #nodes 100 Figure 4: Execution time of Q4 10 1 Q5 - measuring the impact of changes in topology. The 1n 3n 5n 9n execution times from this experiment are shown in Figure 5. Once #nodes again, ArangoDB is in the lead. GraphX execution times decrease as the cluster size increases. OrientDB performs worse that Ti- Figure 7: Execution time of Q7 tanDB and JanusGraph for a cluster size up to 3 and performs better when the cluster grows. Q8 - finding the shortest path between nodes. This experi- ment was run on ArangoDB, OrientDB, TitanDB, and JanusGraph. 1x107 ArangoDB The reason for eliminating GraphX was caused by the implemen- 1x106 OrientDB tation of the shortest path algorithm in GraphX. Rather than TitanDB 100000 JanusGraph simply finding the shortest path between two nodes, it finds all response time [ms] GraphX the shortest paths from all the nodes to the target one, only then 10000 allowing users to select specific paths from a generated RDD. This 1000 heavily influences execution times of such queries. The first stage 100 (computation of the shortest paths) takes minutes rather than milliseconds, and the second (retrieval of specific paths) takes a 10 few milliseconds, making the results fairly incomparable to other 1 GDBMSs. Figure 8 reveals that ArangoDB handles this query in 1n 3n 5n 9n the least amount of time. On a single node, OrientDB performs #nodes worse than TitanDB or JanusGraph, and performs better on 3, 5, and 9 nodes. Figure 5: Execution time of Q5 In Figure 9 we present total execution times of a workload composed of queries Q1, Q2, ..., Q7, for ArangoDB, OrientDB, Q6 - computing the degree of each node. The execution TitanDB, JanusGraph, and GraphX in a cluster composed of 1, times from this experiment are shown in Figure 6. For this query, 3, 5, and 9 machines. As we can observe, ArangoDB, TitanDB, 1x107 As we can observe, the p-values are much lower than the ArangoDB 1x106 OrientDB assumed p-value of 0.01. This means, that the difference in ex- TitanDB JanusGraph ecution times between ArangoDB, OrientDB, and GraphX are 100000 statistically significant for all 8 queries but Q7 on 1 node. It means response time [ms] 10000 that our conclusions are valid for Q1-Q8 except Q7 on 1 node. 1000 100 Table 1: p-values for testing statistical significance of ex- ecution times between (1) GraphX and ArangoDB (Q1-Q5 10 and Q7) as well as between (2) GraphX and OrientDB (Q6) 1 1n 3n 5n 9n Query 1 node 3 nodes 5 nodes 9 nodes #nodes Q1 0.0000000000 0.0000000001 0.0000000003 0.0000000048 Q2 0.0000000054 0.0000000014 0.0000000000 0.0000000000 Q3 0.0000000423 0.0000000006 0.0000000000 0.0000000000 Figure 8: Execution time of Q8 Q4 0.0000000740 0.0000000011 0.0000000000 0.0000000000 Q5 0.0000003403 0.0000000108 0.0000000000 0.0000000000 Q6 0.0000000000 0.0000000000 0.0000000000 0.0000000000 Q7 0.1674163506 0.0000000000 0.0000000000 0.0000000000 and JanusGraph do not offer scaling out, as the total execution time grows with the increase of the number of machines. On the contrary, OrientDB and GraphX offer rather constant execution time w.r.t. the number of machines. 5.3 Functionality assessment In this section we present our assessment of some features of the 1800 GDBMSs, related to user experience, grading each of them on 1600 ArangoDB a scale from 1 to 5 (1 being the lowest and 5 - the highest). The OrientDB 1400 following features were assessed: (1) ease of installing and setting response time [sec] TitanDB 1200 JanusGraph up the GDBMS, (2) ease of using the GDBMS (how complicated is 1000 GraphX its query language, whether it provides access to graph data from 800 other languages), (3) support for multiple OS, and (4) visualization 600 capabilities. The assessment results are shown in Table 2. 400 200 Table 2: Assessing functionality of GDBMSs 0 1n 3n 5n 9n ArangoDB OrientDB TitanDB JanusGraph GraphX #nodes Ease of setup 5 3 4 4 3 Ease of use 5 3 5 5 2 Portability 5 5 5 5 5 Figure 9: Total execution time of a workload composed of Interface 4 4 2 2 1 queries Q1 - Q7 Total 19 15 16 16 11 As it can be seen, ArangoDB wins in this regard as well. Its 5.2 Significance tests installation is straightforward, setting up a cluster requires noth- ing but running a few, simple scripts. Its querying language is From the presented charts we can observe that on the average, robust and intuitive, with a focus on sub-querying. It runs on ArangoDB and GraphX offer the best performance. ArangoDB of- most common operating systems. Its visual interface is decent. fers the best performance for all queries but Q6. Graphx achieves TitanDB and JanusGraph are next in our ranking. Their instal- various results, with being a clear winner in Q6, but performing lation and setting up a cluster require a bit of fiddling, although worse than ArangoDB for all other queries. Thus, we need to it does not require all that much skill. The query languages are check whether it is statistically significant that: easy to learn and use. Both of these GDBMSs do not expose any • ArangoDB achieves better results for Q1-Q5 and Q7-Q8 problems of running on any of the popular operating systems. than GraphX, They do require quite a lot of work to set up any kind of visual • ArangoDB achieves worse performance than GraphX for interface. Q6, OrientDB scores third. Its installation is not difficult, although • GraphX achieves better performance than OrientDB for having a few instances in a cluster is problematic. Its language Q6, since OrientDB is more efficient than ArangoDB in lacks a few built-ins. It supports numerous OSs. Visual represen- executing Q6. tations of graphs it generates are decent and legible. To this end, we applied T-Student tests with p=0.01. The results GraphX scores last. Ease of use was never the focus for Spark- of the significance tests are included in Table 1. The p-values for based tools. Installation and cluster set up is rather easy, but the significance of the results between GraphX and ArangoDB connecting it to a resilient data storage is more difficult. Tutorials are represented by rows with queries Q1-Q5 and Q7, whereas for GraphX are almost non-existent, and documentation occa- p-values for the significance between GraphX and OrientDB are sionally leaves a bit to be desired. Since it is Java-based, it has no represented by row with Q6. Each row includes p-values for the problems running virtually anywhere. Graphical interface (other experiments on 1, 3, 5, and 9 nodes. than Spark management tool) is nonexistent. 6 SUMMARY AND CONCLUSIONS Performance on the HPC Scalable Graph Analysis Benchmark. In Int. Conf. on Web-age Information Management (WAIM). In this paper we presented a graph database benchmark devel- [12] Orri Erling, Alex Averbuch, Josep Larriba-Pey, Hassan Chafi, Andrey Gubichev, oped to meet specific requirements of an international IT com- Arnau Prat, Minh-Duc Pham, and Peter Boncz. 2015. The LDBC Social Network Benchmark: Interactive Workload. In SIGMOD Int. Conf. on Management of pany. Even though over 10 graph benchmarks have been pro- Data. posed in the research literature, none of them reflects the partic- [13] Facebook. [n.d.]. LinkBench. GitHub,https://github.com/facebookarchive/ ular structure of the graph or particular queries needed by the IT linkbench. [14] Avrilia Floratou, Jignesh M. Patel, Willis Lang, and Alan Halverson. 2011. company. Therefore, the benchmark that we developed can be When Free Is Not Really Free: What Does It Cost to Run a Database Work- considered as a complementary to those mentioned in Section 2. load in the Cloud?. In TPC Technology Conference on Performance Evaluation, It contributes another graph structure used by industry and five Measurement and Characterization of Complex Systems (TPCTC). 163–179. [15] Florian Funke, Alfons Kemper, Stefan Krompass, Harumi Kuno, Raghunath queries used by industry. Nambiar, Thomas Neumann, Anisoara Nica, Meikel Poess, and Michael Sei- The benchmark was implemented and used in practice to asses bold. 2012. Metrics for Measuring the Performance of the Mixed Workload CH-benCHmark. In TPC Technology Conference on Performance Evaluation, the performance of 5 open-source GBDMSs in a micro-cluster Measurement and Characterization of Complex Systems (TPCTC). composed variable number of physical nodes (up to 9 nodes were [16] Yuanbo Guo, Zhengxiang Pan, and Jeff Heflin. 2005. LUBM: A Benchmark for used). The experiments that we run showed that: OWL Knowledge Base Systems. Web Semantics 3, 2-3 (2005). [17] Karl Huppler. 2011. Benchmarking with Your Head in the Cloud. In TPC • distributing graph data into multiple nodes does not pro- Technology Conference on Performance Evaluation, Measurement and Charac- terization of Complex Systems (TPCTC). 97–110. vide scaling out; we observed that: (1) query execution [18] Alexandru Iosup, Tim Hegeman, Wing Lung Ngai, Stijn Heldens, Arnau Prat- times increased when the size of the cluster increased Pérez, Thomas Manhardto, Hassan Chafio, Mihai Capotă, Narayanan Sun- (the case of ArangoDB, TitanDB, and JanusGraph) or re- daram, Michael Anderson, Ilie Gabriel Tănase, Yinglong Xia, Lifeng Nai, and Peter Boncz. 2016. LDBC Graphalytics: A Benchmark for Large-scale Graph mained approximately constant (the case of OrientDB and Analysis on Parallel and Distributed Platforms. VLDB Endownment 9, 13 GraphX); (2016). [19] S. Jouili and V. Vansteenberghe. 2013. An Empirical Comparison of Graph • even simple queries can take much longer to execute in a Databases. In Int. Conf. on Social Computing. cluster when a GDB needs to cross-check every node for [20] Martin L. Kersten, Alfons Kemper, Volker Markl, Anisoara Nica, Meikel Poess, arcs leading to another shard; and Kai-Uwe Sattler. 2011. Tractor Pulling on Data Warehouses. In Int. Work- shop on Testing Database Systems. • ArangoDB offers the best performance in the majority [21] LDBCouncil. [n.d.]. LDBC Graphalytics. GitHub,https://github.com/ldbc/ of tests; it also offers the best functionality from a user ldbc_graphalytics. perspective; [22] LDBCouncil. [n.d.]. Social Network Benchmark. LDBCouncil,http://ldbcouncil. org/developer/snb. • GraphX offers the best performance when it comes to [23] Hadj Mahboubi and Jérôme Darmont. 2011. XWeB: the XML Warehouse massive localized data processing (cf. Figure 6), i.e., it is a Benchmark. CoRR (2011). [24] Robert McColl, David Ediger, Jason Poovey, Dan Campbell, and David A. Bader. good match for certain algorithms such as PageRank, that 2014. A performance evaluation of open source graph databases. In Workshop are highly interested in degrees of nodes. on Parallel Programming for Analytics Applications. [25] Umar Farooq Minhas, Jitendra Yadav, Ashraf Aboulnaga, and Kenneth Salem. The performance evaluation can further be extended to test the 2008. Database systems on virtual machines: How much do you lose?. In Int. scalability of GDBMSs w.r.t. graph size and clusters of sizes Conf. on Data Engineering Workshops (ICDE). 35–41. [26] ODBMS. [n.d.]. Operational Database Management Systems - ODBMS. http: greater than 9 nodes. To this end, the proposed GoodBye bench- //www.odbms.org/. mark needs to be further extended as well, to generate graphs of [27] Patrick O’Neil, Betty O’Neil, and Xuedong Chen. 2009. Star Schema Bench- parameterized size and multiple statistical properties. mark. https://www.cs.umb.edu/ poneil/StarSchemaB.PDF. [28] Swapnil Patil, Milo Polte, Kai Ren, Wittawat Tantisiriroj, Lin Xiao, Julio López, Garth Gibson, Adam Fuchs, and Billie Rinaldi. 2011. YCSB++: benchmarking REFERENCES and performance debugging advanced features in scalable table stores. In ACM Symposium on Cloud Computing. 9. [1] Apache. [n.d.]. SynthBenchmark. Apache,https://github.com/apache/spark/ [29] Albrecht Schmidt, Florian Waas, Martin Kersten, Michael J. Carey, Ioana blob/master/exam-ples/src/main/scala/org/apache/spark/examples/graphx/ Manolescu, and Ralph Busse. 2002. XMark: A Benchmark for XML Data SynthBenchmark.scala. Management. In Int. Conf. on Very Large Data Bases. [2] Timothy G. Armstrong, Vamsi Ponnekanti, Dhruba Borthakur, and Mark [30] Priya Sethuraman and H. Reza Taheri. 2011. TPC-V: A Benchmark for Eval- Callaghan. 2013. LinkBench: A Database Benchmark Based on the Facebook uating the Performance of Database Applications in Virtual Environments. Social Graph. In SIGMOD Int. Conf. on Management of Data. In TPC Technology Conference on Performance Evaluation, Measurement and [3] D. Bader, J. Feo, J. Gilbert, J. Kepner, D. Koetser, E. Loh, K. Madduri, B. Mann, Characterization of Complex Systems (TPCTC). 121–135. T. Meuse, and E. Robinson. 2009. HPC Scalable Graph Analysis Benchmark. [31] TPC. [n.d.]. Transaction Processing Council Benchmarks. http://www.tpc. HPC Graph Analysis, http://www.graphanalysis.org/benchmark/. org/. [4] Sharada Bose, Priti Mishra, Priya Sethuraman, and H. Reza Taheri. 2009. Bench- marking Database Performance in a Virtual Environment. In TPC Technology Conference on Performance Evaluation, Measurement and Characterization of Complex Systems (TPCTC). 167–182. [5] M. Ciglan, A. Averbuch, and L. Hluchy. 2012. Benchmarking Traversal Opera- tions over Graph Databases. In Int. Conf. on Data Engineering Workshops. [6] Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking cloud serving systems with YCSB. In ACM Symposium on Cloud Computing. 143–154. [7] Jean-Daniel Cryans, Alain April, and Alain Abran. 2008. Criteria to Compare Cloud Computing with Current Database Technology. In Int. Conf. Software Process and Product Measurement. 114–126. [8] Jerome Darmont, Fadila Bentayeb, and Omar Boussaid. 2007. Benchmarking Data Warehouses. Int. Journal of Business Intelligence and Data Mining 2, 1 (2007). [9] DB-ENGINES. [n.d.]. DB-Engines Ranking of Graph DBMS. https://db-engines. com/en/ranking/graph+dbms. [10] David Dominguez-Sal, Norbert Martinez-Bazan, Victor Muntes-Mulero, Pere Baleta, and Josep Lluis Larriba-Pay. 2011. A Discussion on the Design of Graph Database Benchmarks. In TPC Technology Conference on Performance Evaluation, Measurement and Characterization of Complex Systems (TPCTC). [11] D. Dominguez-Sal, P. Urbón-Bayes, A. Giménez-Vañó, S. Gómez-Villamor, N. Martínez-Bazán, and J. L. Larriba-Pey. 2010. Survey of Graph Database