Choosing Between Graph Databases and RDF
Engines for Consuming and Mining Linked Data

Domingo De Abreu, Alejandro Flores, Guillermo Palma, Valeria Pestana, José
       Piñero, Jonathan Queipo, José Sánchez, Maria-Esther Vidal

                Universidad Simón Bolı́var, Caracas, Venezuela
     {dabreu, aflores, gpalma, vpestana, jpinero, jqueipo, jsanchez,
                            mvidal}@ldc.usb.ve

      Abstract. Graphs naturally represent Linked Data and implementa-
      tions of graph-based tasks are required not only for data consumption,
      but also for mining patterns among links. Despite efficient graph-based
      algorithms and engines have been implemented, there is no clear under-
      standing of how these solutions may behave on Linked Data. We evaluate
      both general purpose graph database and state-of-the-art RDF engines,
      and our experimental results reveal characteristics of linked datasets and
      graph-based tasks that may affect their performance. These results can
      be considered as a further step for solving the problem of choosing be-
      tween graph databases to consume and mine Linked Data.


1   Introduction
Graphs are commonly used to represent linked data, and several efficient algo-
rithms have been proposed not only to consume, but also to mine Linked Data.
For example, Saha et al. [16] and Thor et al. [18] have defined densest sub-
graphs and graph summarization techniques to identify patterns between linked
datasets of genes. Further, algorithms for finding subgraph isomorphisms and fre-
quent subgraphs have been extensively studied in the literature [1]. The majority
of these algorithms are computationally complex, and rely on core graph-based
tasks to solve graph reachability, traversal, adjacency and pattern matching.
    Additionally, a variety of engines have been developed to manage, store and
query graph databases. Particularly, in the context of the Semantic Web, en-
gines have been defined to store and consume RDF graphs. However, existing
RDF engines focus on individual triples, more than providing a graph oriented
representation of the data, and the implementation of core graph-based tasks
can be time-inefficient because of the number of potential self-joins required to
execute these tasks. On the other hand, several general purpose graph systems
have been defined to manage graph databases; also, APIs are usually available
to execute core graph-based tasks on these databases. However, many of the ex-
isting graph engines do not exploit the properties of RDF, and pattern matching
queries on RDF-based graphs may be inefficient. Therefore, it is important to
conduct evaluations that uncover the properties and limitations of existing graph
engines, and that provide a further step for solving the problem of choosing the
most appropriate engine to execute a given task against linked datasets.
    In this paper we empirically evaluate the performance of three general pur-
pose graph database engines: DEX [11], Neo4j [15] and HypergraphDB [8]. Ad-
ditionally, we study RDF-3x [14] a best-of-breed RDF engine [7]. During the
evaluation, we analyze the performance of these engines on tasks of adjacency
and reachability. Further, we study different implementations of the algorithms
to find shortest paths and k-hops; also, graph traversals following breadth-first
and depth-first search strategies are evaluated. Finally, we analyze the perfor-
mance of pattern matching queries, and the mining tasks of densest subgraphs
and graph summarization algorithms. A variety of synthetic graphs are stud-
ied; graphs with different density and size are generated using existing graph
generators (e.g., GTgraph1 and the Berlin SPARQL Benchmark (BSBM) [3]).
We describe the variables and configuration setups that impact on the perfor-
mance of the studied engines when the above mentioned graph tasks are executed
against the generated graphs. Finally, we discuss the results of our evaluation,
and outline the different conditions that benefit the studied engines. Results of
the experiments are published at http://graphium.ldc.usb.ve.
    To summarize, our contributions are as follows: i) a characterization of graph
properties and tasks that impact on the performance of existing graph engines;
ii) benchmarks comprised of graph-based tasks and a variety of graphs to eval-
uate existing graph engines; and iii) an empirical study of the performance of
graph engines on tasks for consuming and mining linked data.
    This paper is organized as follows: Section 2 summarizes related works, and
Section 3 describes the studied graph engines and tasks. Section 4 illustrates
parameters that impact existing graph engines, while experimental results are
reported in Section 5. Section 6 concludes and outlines our future works.


2     Related Work

Existing RDF engines effectively address the challenges of consuming RDF data.
However, independent evaluations [7, 10] suggest that thanks to physical struc-
tures and its execution techniques, RDF-3x is able to overcome existing engines.
Based on these results, we selected RDF-3x as an exemplar of RDF engines.
    Several engines address the problem of managing large persistent graphs (e.g.,
DEX [11], Neo4j [15], and HypergraphDB [8]). Commonly, these engines provide
APIs and rely on physical structures to speed up the execution of graph-based
tasks. We evaluate their main functionalities and propose a set of tasks whose
evaluation uncovers useful insights that suggest some of their limitations.
    Benchmarking has contributed to the improvement of systems and in general,
of existing areas by the means of defining fair and neutral benchmarks. Partic-
ularly, in the context of Databases, the family of the Transactional Processing
Performance Council (TPC) benchmarks[12] were developed for database-centric
standards and for the evaluation of system performance under clearly defined
dataset conditions and workloads. Recently, the TPC family has been extended
1
    http://www.cse.psu.edu/~madduri/software/GTgraph/index.html
to other domains, e.g., virtualization, multi-source data integration, and graph
analysis in multiprocessors. Although some of the new released TPC bench-
marks attempt to provide graph generation standards [5], the graph-based tasks
are mainly focused on network centrality while core graph-based tasks cannot be
evaluated, e.g., pattern marching, adjacency. Other initiatives have impulsed the
study of performance of graph database engines in current cloud service infras-
tructures and mostly networking tasks, e.g., [2] and [4]; and recently, the LDBC2
council was created to work on benchmarks for Linked Data. Finally, benchmarks
have been defined to test existing RDF engines: LUBM [6], the Berlin SPARQL
Benchmark [3], the RDF Store Benchmarks with DBpedia3 , and the SP2 Bench
SPARQL benchmark [17]. Similarly, we tailored a family of graph characteristics
and tasks that allow us to reveal properties of existing RDF/Graph engines.


3     Existing Graph Databases and Graph-Based Tasks

3.1   Graph Database and RDF Engines

DEX [11] is a database engine that relies on labeled attribute multigraphs to
model linked data. To speed up the different graph-based tasks, DEX offers dif-
ferent types of indexing: i) attributes. ii) unique attributes, iii) edges to index
their neighborhood, and iv) indices on neighborhoods. A DEX graph is stored
in a single file; values and resource identifiers are mapped by mapping func-
tions; and maps are modeled as B+-tree. Bitmaps are used to store nodes and
edges of a certain type. DEX provides an API to create and manage graph
datasets. Different functions are offered to efficiently traverse and select graphs:
i) sub-graphs whose nodes or edges are associated with labels that satisfy certain
conditions, ii) explode-based methods that visit the edges of a given node, and
iii) neighbor-based methods that visit the neighbors of a given node.
     Similar to DEX, Neo4j [15] represents graphs as labeled attribute multigraphs
implemented in native structures. Neo4j can manage several types of indices
on nodes and relationships, and graphs can be traversed by following common
policies of breadth- and depth-first. Neo4j provides an API, but Cypher can be
also used to specify pattern matching queries.
     HypergraphDB [8] models graphs as hypergraphs, a graph generalization
where several edges correspond to a hyperedge. Hyperedges connect ordered
sets of nodes, and can point to other hyperedges. Data is stored in the form of
key-value pairs on top of BerkeleyDB, and is indexed using B-trees. Different
access methods and indices are provided to speed up core graph-based tasks,
e.g., indices on edges or neighborhoods.
     Finally, RDF-3x [14] relies on a native index system to efficiently store RDF
data and reduce the overhead of I/O operations. RDF data graphs are imple-
mented as a Triple table where each labelled edge of an RDF graph is represented
2
  Linked Data Benchmark Council is sponsored by the European Community under
  ICT-FP7 http://www.ldbc.eu
3
  http://www4.wiwiss.fu-berlin.de/benchmarks-200801/
as a tuple of the table. A mapping dictionary is used to encode literals by short
number IDs, and compressed clustered B+-trees are used to index data in lexico-
graphic order. Six permutations of subject, predicate and object are considered
to create an index on each permutation. Additionally, RDF-3x implements opti-
mization and execution techniques that support efficient and scalable execution
of SPARQL queries. Further, it exploits operating system features to manage
cache, and it is able to load portions of intermediate results in resident memory.
Thus, RDF-3x speeds up execution time of queries where previous intermediate
results can be reused, and it is considered a best-of-breed RDF engine [7].


3.2   Graph-Based Tasks

We describe a set of graph-based tasks and the main properties of our imple-
mentations. Source code is available at https://github.com/gpalma/gdbb.
Graph Creation: consists in measuring the execution time consumed by the
engines during the creation and storage of the internal representation of a graph.
Adjacency: checks node adjacency, given that two nodes are adjacent if there
is an edge between them. The adjacency tasks are as follows: i) adjacentXP:
given a node x and an edge label p, find the set of adjacent nodes y. ii) adja-
centX: given a node x, find the set of adjacent nodes y. iii) edgeBetween: given
two nodes x and y, find the set of labels of the edges between x and y. These
tests are implemented using the nodes/edges adjacency facilities provided by
the engines’ APIs. In the case of RDF-3x, these tasks are expressed as SPARQL
queries. During the evaluation nodes are randomly selected.
Reachability: we study two traversal algorithms: Breadth-first search (BFS)
and Depth-first search (DFS), and implement them using basic adjacency meth-
ods of the engines’ APIs. We refer to these implementations as external BFS
and external DFS. Additionally, we use reachability methods provided by the
engines; we name these implementations internal BFS and internal DFS. In gen-
eral, given two nodes x and y, the problem is to decide if there is a path between
them in the graph. Also, we implement a reachability task named k-hops, which
given a starting node x, retrieves the set of nodes S such that there is a path of
length k from x to y, i.e., paths following out-going links. We implement three
versions of k-hops. The first is internal k-hops, and is built using the engines’
methods to implement k-hops; this implementation relies on graph internal rep-
resentations and indices to speed up the k-hops computation. The second is
external k-hops, implemented with a BFS algorithm with bounded depth, using
adjacency methods of the APIs. Finally, k-hops is expressed as SPARQL queries
in RDF-3x. During the evaluation, nodes are randomly chosen.
Pattern matching: finds all subgraphs that are isomorphic to a pattern graph.
The following tasks are implemented in RDF-3x using SPARQL queries: i) ad-
jacentXP, ii) adjacentX, iii) edgeBetween, iv) 2-hops, v) 3-hops, and vi) 4-hops.

Additionally, we implement two main mining algorithms: densest subgraphs and
graph summarization. Conceptually, these tasks are described as follows:
Densest Subgraph: Let G = (V, E) be a directed graph, given subsets S and
                                                                    |E(S,T )|
T of vertices, the density of the subgraph is defined as d(S, T ) = √         where
                                                                       (|S||T |)
E(S,T) is the set of edges going from S to T . The Densest Subgraph problem is to
find a subgraph of maximum density. We implement, for each graph database,
the linear time 2-approximation algorithm for the densest subgraph problem
in directed graphs proposed by Saha et al.[16]. This implementation relies on
nodes/edges adjacency functions provided by the APIs of methods.
Graph Summarization: Let G be a graph, the goal is to produce a graph
summary representation of G. A graph summary is an aggregate graph in which
each node corresponds to a set of nodes in G, and each edge represents the edges
between all pairs of nodes in the two sets. We implement a greedy algorithm
proposed by Navlakha et al.[13] to build highly compressed graph representa-
tions. This implementation is built on top of the nodes/edges adjacency functions
provided by the APIs of the studied graph databases.


4     Analysis of the Benchmark Characteristics

We analyze the main parameters that affect the performance of a graph database
engine. Parameters can be independent and dependent. Independent parameters
represent characteristics that impact on the behavior of an engine; they need
to be specified to ensure reproducibility of an evaluation, as well as, fair and
neutral tests. Independent variables have been grouped into three dimensions:
graph characteristics and representations, and task implementations. Dependent
parameters correspond to the characteristics that are normally impacted by inde-
pendent parameters. For instance, i) Task Execution Time: elapsed time between
the submission of the task and the generation of the complete answer. ii) Main
Memory: amount of main-memory required to execute a task. iii) Secondary
Memory: amount of secondary-memory required to store the internal represen-
tation of a graph. Table 1 summarizes independent and dependent parameters.
    First, we can observe that size, density, in- and out-degree of nodes and
distribution of the labels, all impact in the time required to create a graph, and in
the amount of both main- and secondary-memory consumed during the creation
and storage of the graph. Additionally, these properties affect execution time of
any graph traversal and reachability task, as well as summarization and densest
subgraph. To illustrate how graph characteristics can affect the performance
of a graph database engine, lets consider graphs in Table 2. DSJC.1, DSJC.5 and
DSJC.9 are randomly generated graphs proposed by Johnson et al. [9] as instances
to solve the graph coloring problem4 ; the number of nodes remains the same
along the three graphs. Additionally, the family of the graphs Fixed-number-arcs-
0.X were generated using GTgraph.5 ; the number of arcs is fixed while density
and number of nodes vary. Table 3 reports on the execution time (msecs.), main-
memory consumption (MB) and secondary-memory (MB) required to store the
4
    https://sites.google.com/site/graphcoloring/vertex-coloring
5
    http://www.cse.psu.edu/~madduri/software/GTgraph/index.html
     Table 1. Variables that impact graph engines. Graph Characteristics (GC);
     Graph Representations (GR); and Task Implementation (TI)

                                                 Dependent Parameters
      Independent Parameters
                                     Task Execution Time Main Memory Secondary Memory
          Graph Density                      X               X              X
          # Nodes                            X               X              X
     GC


          # Edges                            X               X              X
          Edge Label Distribution            X               X              X
          # Node In- Out-Degree              X               X              X
          Compressed Structures              X               X              X
     GR


          Indices                            X               X              X
          Transactional management           X               X
          Engine API Methods                 X               X              X
     TI


          Types of Traversal                 X               X
          External Implementations           X               X


corresponding Neo4j graphs. We can observe that the behavior of Neo4j can be
completely different for graphs where density varies. On one hand, both loading
time and memory consumption grow as density for graphs DSJC.1, DSJC.5 and
DSJC.9 increases. However, for the rest of the graphs, these values remain almost
the same. This suggests that not only the density of a graph impacts on the cost
of creating the Neo4j graph representation. The combination of density, number
of nodes, edges and labels also seems to affect the behavior of this engine, and
different configurations need to be set up during its evaluation.

                                                                  #Arcs
    Table 2. Benchmark of Graphs. Density is defined as #V ertices∗(#V ertices−1)


           Graph          # Vertices # Arcs # Labels Density File size
    DSJC.1                      1,000     99,258      1    0.10  1.1M
    DSJC.5                      1,000    499,652      1    0.50  5.2M
    DSJC.9                      1,000    898,898      1    0.90  9.3M
    Fixed-number-arcs-0.1      10,000 10,000,000 10,000    0.10  140M
    Fixed-number-arcs-0.5       4,472 10,000,000 10,000    0.50  138M
    Fixed-number-arcs-0.9        3333 10,000,000 10,000    0.90  136M
    USA-road-d.NY             264,346    730,100  7,970 1.04E-5   13M
    USA-road-d.FLA          1,070,376 2,687,902  22,704 2.35E-6   48M
    Berlin10M               2,743,235 9,709,119      40 2.58E-6   2.1G


   Transactions need to be carefully configured during the creation of the corre-
sponding graph representations. To explain, graph database engines provide the
possibility to configure the size of a transaction, i.e., when checkpoints will be
executed. Suppose the RDF dataset of Berlin 10M (Table 2) was loaded in Neo4j,
and three different transactional models were configured: i) one transaction per
the whole graph creation process, ii) one transaction per 100,000 edges inserted
in the graph, and iii) no transactions. The experiment was run on a Sun Fire
X4100 M2 machine with two AMD Opteron 2218 processors with 16GB RAM,
running a 64-bit Linux CentOS 5.5. When the transactional model ‘i’ was fol-
lowed, the loading process ran out of memory after 10 hours, while the second
model allowed to load the whole graph in 7 hours. Finally, when no transactions
were considered, Neo4j could upload this graph in 2.1 hours and memory con-
sumption was 7.35GB. This suggests that properly setting up the transactional
model is extremely important during the evaluation of a given engine.


Table 3. Neo4j performance for graphs of different density. Dependent vari-
ables for the task of Graph Creation: Execution Time (msecs.); Main-Memory
(MB); Secondary-Memory (MB). Java methods System.currentTimeMillis() and
Runtime.getRuntime() were used to measured time and main-memory, respectively.

         Graph            Execution Time Main-Memory Secondary-Memory
                             (msecs.)       (MB)           (MB)
    DSJC.1                          12,295        17.27            3.7
    DSJC.5                          15,719        80.41           16.6
    DSJC.9                          21,460       142.60           29.5
    Fixed-number-arcs-0.1          143,610     1,713.83          651.6
    Fixed-number-arcs-0.5          131,621     1,711.09          324.3
    Fixed-number-arcs-0.9          130,901     1,709.35          323.9


   Finally, we consider the tasks of graph summarization, densest subgraph
and shortest paths for DSJC.1, DSJC.5 and DSJC.9, and the engines Neo4j, DEX
and HypergraphDB (Table 4). We can observe that tasks as shortest paths and
densest subgraph do not seem to be as affected as graph summarization by
the characteristics of these three graphs. To explain, both shortest paths and
densest subgraph require to iteratively traverse the graphs until the paths with
the shortest length are identified. Because in these three graphs the number of
nodes is the same, and even when the number of edges increases, the maximal
length of the paths does not change considerably. Thus, the time consumed by
these two tasks is similar for these three graphs. Contrary, graph summarization
requires to traverse all the edges to construct the compact representation of a
graph and the corresponding corrections; therefore, graph summarization may
be considerably affected when the number of edges increases.


5     Experimental Results

Considering the characteristics previously described, we configured several bench-
marks to study the performance of DEX (version 4.8 very large databases), Neo4j
(version 1.9), HypergraphDB (version 1.2) and RDF-3x (version 0.3.7). We eval-
uate graph creation time and memory consumption for a variety of graphs as
well as execution time for diverse graph tasks. Indices on unique attributes are
    Table 4. Graphs of different density for Graph Summarization Densest Subgraph
    and Shortest Paths. Timeout 3.5 hours.

            Graphs Graph Summarization (msecs) Densest Subgraph (msecs.) Shortest Path(msecs.)
    Neo4j

            DSJC.1             7,573                      6,511                  6,774
            DSJC.5           1,134,155                    8,901                  9,466
            DSJC.9           TimeOut                      14,976                 14,436
            DSJC.1             6,810                      4,786                  4,368
    DEX


            DSJC.5           1,031,753                    5,898                  5,261
            DSJC.9          12,576,203                    6,174                  4,865
    HGDB


            DSJC.1             10,974                     15,379                 18,226
            DSJC.5           1,047,470                    85,470                 84,017
            DSJC.9           TimeOut                     159,300                155,730


created for Neo4j, DEX and HypergraphDB; additionally, DEX uses indices on
edges. Indices are loaded during the creation phase.
Graphs: DSJC.1, DSJC.5, DSJC.9, USA-road-d.NY, USA-road-d.FLA and Berlin10M
are studied (Table 2). USA-road-d.NY and USA-road-d.FLA correspond to the New
York City and Florida State road networks; these graphs are part of the 9th
DIMACS Implementation Challenge - Shortest Paths6 .
Graph-based Tasks: we evaluate creation, adjacency, reachability, pattern match-
ing, densest subgraph, and graph summarization.
Evaluation Metrics: we report on runtime performance, which is measured
by using the real time produced by the time command of the Linux operation
system. Runtime represents the elapsed time between the submission of a task
and the output of the answer. Further, we present the size in MB of the internal
representation of a graph. Neo4j, DEX and HypergraphDB received graphs in
the sif format7 , while graphs were passed to RDF-3x as sets of N-triples. Exper-
iments were run on a Sun Fire X4100 M2 machine with two AMD Opteron 2218
processors with 16GB RAM, running a 64-bit Linux CentOS 5.5. All tests were
executed in cold cache, i.e., we cleared the cache before running each task by
performing the command sh -c "sync ; echo 3 > /proc/sys/vm/drop caches".
Additionally, the machine was dedicated exclusively to run the experiments.
Details can be found at http://graphium.ldc.usb.ve.

5.1          Performance During The Graph Creation Task
Figures 1(a) and (b) report on the creation time (secs and logarithm scale) and
secondary-memory (MB and logarithmic scale) required to store DSJC.1, DSJC.5,
DSJC.9, USA-road-d.NY, USA-road-d.FLA and Berlin10M. HypergraphDB timed out
creating Belin10M after 24 hours, and required more time for creating the other
five graphs than the rest of the engines. Although Neo4j and RDF-3x were
competitive during the creation of USA-road-d.NY and USA-road-d.FLA, in general,
we could say that RDF-3x is faster than the other engines in this task. This may
6
    http://www.dis.uniroma1.it/challenge9/download.shtml
7
    http://wiki.cytoscape.org/Cytoscape_User_Manual/Network_Formats
                            10000                                                                                    10000
                                    RDF3X                                                                                    RDF3X


                                                                                       Graph size MB (log10 scale)
                                     Neo4j                                                                                    Neo4j

 Time secs. (log10 scale)
                             1000     DEX                                                                             1000     DEX
                                     HGDB                                                                                     HGDB

                              100                                                                                      100


                               10                                                                                       10


                                1                                                                                        1
                                      DSJC.1 DSJC.5 DSJC.9     NY      FLA Berlin10m                                           DSJC.1 DSJC.5 DSJC.9     NY      FLA Berlin10m
                                             Graph instances on creation task                                                         Graph instances on creation task

                            (a) Creation Time log-scale secs.                                            (b) Secondary Memory log-scale MB.

Fig. 1. Comparison of Neo4j, DEX, HypergraphDB and RDF-3x for graph creation of
DSJC.1, DSJC.5, DSJC.9, USA-road-d.NY, USA-road-d.FLA and Berlin10M. No transac-
tions were considered. Non-shown bars indicate that the corresponding engine timed
out after 24 hours, i.e., HypergraphDB timed out creating Berlin10M after 24 hours.


be the result of exploiting in-memory caching techniques that allow RDF-3x
to maintain hash tables to store the internal representation of an RDF graph,
e.g., mappings between strings/URIs in RDF data to unsigned integer IDs, a
single Triple table of these IDs, and the highly compressed indices. Additionally,
the sif format used to represent the graphs, the size of the labels, and URIs
could impact creation time of existing graph databases. We can also observe that
the amount of secondary memory of the generated graphs grows as the size of
original graphs increases. As previously discussed, several parameters impact on
the size of the structure generated by each engine, e.g., density, number of nodes,
edges and labels. However, it is important to highlight that RDF-3x graphs are
smaller than the rest of the graphs generated by the other engines. This may be
because RDF-3x makes use of LZ77 compression in conjunction with byte-wise
Gamma encoding for IDs and gap compression [14].

5.2                           Performance on Core Graph-Based Tasks
Figures 2(a), (b) and (c) report on the execution time of the adjacency tasks;
time is presented in msecs. and logarithmic scale. We recall that these tasks are
implemented in the graph engines using their APIs while in the case of RDF-3x,
we express them as SPARQL queries. Because RDF-3x implements specialized
structures and indices for speeding up SPARQL queries, we can observe that
it overcomes the rest of the engines by up to three orders of magnitude. Nev-
ertheless, if the rest of the graph engines are compared, Neo4j and DEX are
competitive, while HypergraphDB always consumes at least one order of magni-
tude more time than the rest of the engines. Finally, Figure 2(d) presents average
execution time of the k-hops task, for k ∈ {2, 3, 4}; execution time is presented
in secs, and logarithmic scale. Similarly, k-hops tasks are expressed in RDF-3x
as SPARQL queries. On the other hand, in the rest of the graph engines, k-hops
are implemented by using methods provided by their APIs for this task; we name
this implementation as internal k-hops. Additionally, we study the effects of im-
plementing this task by following a BFS strategy and relying on data structures
to maintain the neighbors at a distance of m and m + 1 in main-memory. Basic
APIs functions are used to retrieve the adjacent nodes of a given node. This
implementation is named external k-hops. First, we can observe that RDF-3x
is impacted when graphs are dense, e.g., RDF-3x timed out computing 4-hops
in DSJC.5 and DSJC.9 after 60 minutes; nevertheless, RDF-3x can exploit its
internal structures and indices in sparse graphs, e.g., USA-road-d.NY and USA-
road-d.FLA. To explain, dense graphs with a fixed number of nodes, have a large
number of edges that impact on the size of the k-hops. In consequence, this also
affects the size of the intermediate results generated during the execution of the
corresponding SPARQL queries; thus, RDF-3x requires more time than the rest
of the engines to execute these queries against dense graphs. Contrary, RDF-3x
exhibits a better performance on sparse graphs; in fact, RDF-3x overcomes the
rest of the engines in both USA-road-d.NY and USA-road-d.FLA. Further, DEX
exhibits a better performance in the majority of the graphs than the rest graph
engines, and its internal and external implementations are competitive.


                             1e+06                                                                                            1e+06
                                      RDF3X                                                                                                RDF3X
 Time msecs. (log10 scale)


                                                                                          Time msecs. (log10 scale)


                             100000    Neo4j                                                                                 100000         Neo4j
                                        DEX                                                                                                  DEX
                             10000     HGDB                                                                                   10000         HGDB

                              1000                                                                                             1000

                               100                                                                                              100

                                10                                                                                               10

                                 1                                                                                                   1
                                        DSJC.1 DSJC.5 DSJC.9      NY      FLA Berlin10m                                                       DSJC.1 DSJC.5 DSJC.9          NY       FLA Berlin10m
                                                Graph instances on adjacentX task                                                                    Graph instances on adjacentXP task

(a) AdjacentX task runtime, msecs. log- (b) AdjacentXP task runtime, msecs. log-
scale.                                  scale.
                             1e+06                                                                                           10000        RDF3X         In-Neo4j        In-DEX        In-HGDB
                                      RDF3X                                                                                              Ex-Neo4j        Ex-DEX       Ex-HGDB
 Time msecs. (log10 scale)


                             100000    Neo4j                                                                                  1000
                                                                                                  Time secs. (log10 scale)


                                        DEX
                             10000     HGDB
                                                                                                                               100
                              1000
                                                                                                                               10
                               100

                                10                                                                                              1


                                 1                                                                                             0.1
                                        DSJC.1 DSJC.5 DSJC.9      NY      FLA Berlin10m                                                      DSJC.1 DSJC.5 DSJC.9          NY       FLA   Berlin10m
                                               Graph instances on edgeBetween task                                                                  Graph instances on average of k-hops task


(c) EdgeBetween task runtime, msecs. log-                                                                                    (d) Average k-hops, secs. log-scale.
scale.

Fig. 2. Performance of Adjacency and Average k-hops (average execution time of 2-
hops, 3-hops, and 4-hops). Internal k-hops are implemented using APIs (In-DEX, In-
Neo4j, In-HGDB ); external k-hops follow BFS traversal strategy (Ex-Neo4j, Ex-DEX,
Ex-HGDB). Non-shown bars indicate that the corresponding engine timed out after 1
hour.
    Finally, we study internal and external implementations of BFS and DFS
search strategies. Figures 3(a) and (b) report on the execution time of these
implementations when the studied graph engines are run against the five graphs.
As expected, internal implementations of both BFS and DFS outperform the
external implementations in all the engines. Nevertheless, in general, we can say
that Neo4j exhibits a better performance than DEX and HypergraphDB. This
may be caused by internal representation of Neo4j graphs and main-memory
structures maintained by Neo4j to keep track of the nodes visited during the
search.


                            10000                                                                                 10000
                                    Ex-Neo4j        Ex-DEX        Ex-HGDB                                                 Ex-Neo4j        Ex-DEX        Ex-HGDB
                                     In-Neo4j        In-DEX        In-HGDB                                                 In-Neo4j        In-DEX        In-HGDB
 Time secs. (log10 scale)


                                                                                       Time secs. (log10 scale)
                             1000                                                                                  1000


                              100                                                                                   100


                               10                                                                                    10


                                1                                                                                     1
                                      DSJC.1 DSJC.5 DSJC.9     NY      FLA Berlin10m                                        DSJC.1 DSJC.5 DSJC.9     NY      FLA Berlin10m
                                                Graph instances on BFS task                                                           Graph instances on DFS task

(a) Breadth-First Search task runtime, (b) Depth-First Search task runtime, secs.
secs. log-scale.                       log-scale.

Fig. 3. Comparison of Neo4j, DEX and HypergraphDB for Internal and External
BFS and DFS. Internal tasks are implemented using APIs (In-DEX, In-Neo4j and
In-HGDB); external algorithms rely on adjacency functions of the APIs (Ex-DEX, Ex-
Neo4j and Ex-HGDB). Execution Time secs. log-scale. Non-shown bars indicate that
the corresponding engine timed out after 1 hour.


5.3                           Performance on Graph Summarization and Densest Subgraph

We further evaluate the impact on the studied graph database engines of the
mining tasks of graph summarization and densest subgraph (Figure 4). It is
important to highlight that to best of our knowledge, this is the first evaluation
of these tasks on Neo4j, DEX, and HypergraphDB. Graph density, size and
number of labels impact on the performance of both graph summarization and
densest subgraph in all the engines. Nevertheless, graph summarization seems
to be more affected by the graph density and the number of labels than for
size of graph, i.e., execution time increases as the density of the DSJC.X family
increases, as well as the number of labels grows in USA-road-d.NY and USA-
road-d.FLA. Contrary, densest subgraph is more impacted by the size of the
graphs. Additionally, we can observe that DEX overcomes the rest of the engines
when the graphs are dense, while Neo4j has better performance in sparse graphs
even if they have a large number of labels. To conclude, performance of these
engines on both mining tasks is highly sensitive to graph characteristics.
                                1000                                                                                    100000
                                       Neo4j                                                                                     Neo4j
                                        DEX                                                                                       DEX

    Time secs. (log10 scale)


                                                                                             Time secs. (log10 scale)
                                                                                                                        10000
                                       HGDB                                                                                      HGDB
                                 100
                                                                                                                         1000

                                                                                                                          100
                                  10
                                                                                                                           10

                                   1                                                                                        1
                                        DSJC.1 DSJC.5 DSJC.9         NY      FLA Berlin10m                                        DSJC.1 DSJC.5 DSJC.9    NY      FLA Berlin10m
                                               Graph instances on densest subgraph task                                            Graph instances on graph summarization task

                               (a) Densest Subgraph task runtime.                            (b) Graph Summarization task runtime.

Fig. 4. Comparison of Neo4j, DEX, HypergraphDB on Mining Tasks. Execution Time
in secs, log-scale. Non-shown bars indicate that the corresponding engine timed out
after 3.5 hours.


6                               Conclusion and Future Work
We evaluated RDF-3x, Neo4j, DEX and HypergraphDB. First, our experimental
study suggests that the considered independent variables highly impact on the
performance of all these engines. Second, none of the engines outperforms the
rest in all graph-based tasks. Nevertheless, we observe that RDF-3x exhibits a
better performance than the rest of the engines during the execution of graph
creation and pattern matching tasks; but, it is highly sensitive to graph density.
HypergraphDB poorly performs in all the tasks of the study. Finally, Neo4j
and DEX are quite competitive, and in some cases, one complements the other.
Particularly, DEX outperforms Neo4j and HypergraphDB in the adjacency tasks
and k-hops; also, DEX performs quite well in mining tasks if graphs are dense.
On the other hand, Neo4j has a better performance in BFS and DFS traversals,
as well as in mining tasks against sparse large graphs with high number of
labels. Based on these results, we conclude that benchmarks that consider these
independent variables and graph-based tasks need to be developed to uncover
properties and limitations of these engines. In the future we plan to design and
implement benchmarks able to measure performance of graph database and RDF
engines. Additionally, to better understand cons and pros of existing engines, the
analysis of graph structures and core methods is part of our future plans.

References
 1. C. C. Aggarwal and H. Wang. Graph data management and mining: A survey of
    algorithms and applications. In Managing and Mining Graph Data, pages 13–68.
    2010.
 2. R. Angles, A. Prat-Pérez, D. Dominguez-Sal, and J.-L. Larriba-Pey. Benchmarking
    database systems for social network applications. In GRADES 2013, 2013.
 3. C. Bizer and A. Schultz. The berlin sparql benchmark. Int. J. Semantic Web Inf.
    Syst., 5(2):1–24, 2009.
 4. M. Dayarathna and T. Suzumura. Xgdbench: A benchmarking platform for graph
    stores in exascale clouds. In CloudCom, pages 363–370, 2012.
 5. D. Dominguez-Sal, N. Martı́nez-Bazan, V. Muntés-Mulero, P. Baleta, and J.-L.
    Larriba-Pey. A discussion on the design of graph database benchmarks. In TPCTC,
    2010.
 6. Y. Guo, Z. Pan, and J. Heflin. Lubm: A benchmark for owl knowledge base systems.
    J. Web Sem., 3(2-3):158–182, 2005.
 7. J. Huang, D. J. Abadi, and K. Ren. Scalable sparql querying of large rdf graphs.
    PVLDB, 4(11):1123–1134, 2011.
 8. B. Iordanov. Hypergraphdb: A generalized graph database. In WAIM Workshops,
    pages 25–36, 2010.
 9. D. S. Johnson, C. R. Aragon, L. A. McGeoch, and C. Schevon. Optimization
    by simulated annealing: an experimental evaluation; part ii, graph coloring and
    number partitioning. Operations research, 39(3):378–406, 1991.
10. T. Lampo, , M.-E. Vidal, J. Danilow, and E. Ruckhaus. To cache or not to cache:
    The effects of warming cache in complex sparql queries. In ODBASE 2011, pages
    111–111, 2011.
11. N. Martı́nez-Bazan, V. Muntés-Mulero, S. Gómez-Villamor, J. Nin, M.-A. Sánchez-
    Martı́nez, and J.-L. Larriba-Pey. Dex: high-performance exploration on large
    graphs for information retrieval. In CIKM, pages 573–582, 2007.
12. R. O. Nambiar, M. Poess, A. Masland, H. R. Taheri, M. Emmerton, F. Carman,
    and M. Majdalany. Tpc benchmark roadmap 2012. In TPCTC, pages 1–20, 2012.
13. S. Navlakha, R. Rastogi, and N. Shrivastava. Graph summarization with bounded
    error. In ACM SIGMOD, pages 419–432. ACM, 2008.
14. T. Neumann and G. Weikum. x-rdf-3x: Fast querying, high update rates, and
    consistency for rdf databases. PVLDB, 3(1):256–263, 2010.
15. I. Robinson, J. Webber, and E. Eifrem. Graph Databases. O’Reilly Media, 2013.
16. B. Saha, A. Hoch, S. Khuller, L. Raschid, and X.-N. Zhang. Dense subgraphs
    with restrictions and applications to gene annotation graphs. In RECOMB, pages
    456–472, 2010.
17. M. Schmidt, T. Hornung, N. Küchlin, G. Lausen, and C. Pinkel. An experimental
    comparison of rdf data management approaches in a sparql benchmark scenario.
    In ISWC, pages 82–97, 2008.
18. A. Thor, P. Anderson, L. Raschid, S. Navlakha, B. Saha, S. Khuller, and X.-N.
    Zhang. Link prediction for annotation graphs using graph summarization. In
    ISWC, pages 714–729, 2011.