<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>On the Performance of Analytical and Pattern Matching Graph Queries in Neo4j and a Relational Database</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jürgen Hölsch</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tobias Schmidt</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer and Information Science, University of Konstanz P.</institution>
          <addr-line>O. Box 188, 78457 Konstanz</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Graph databases with a custom non-relational backend promote themselves to outperform relational databases in answering queries on large graphs. Recent empirical studies show that this claim is not always true. However, these studies focus only on pattern matching queries and neglect analytical queries used in practice such as shortest path, diameter, degree centrality or closeness centrality. In addition, there is no distinction between di erent types of pattern matching queries. In this paper, we introduce a set of analytical and pattern matching queries, and evaluate them in Neo4j and a market-leading commercial relational database system. We show that the relational database system outperforms Neo4j for our analytical queries and that Neo4j is faster for queries that do not lter on speci c edge types.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>
        Application domains such as social media, biology, and
transportation planning produce large amounts of graph data.
Therefore, the management and processing of graph data
is gaining importance. As a result, there are several graph
databases such as Neo4j, DEX/Sparksee, and OrientDB. In
general, graph databases can be categorized into systems
using an existing relational database as a backend or systems
implementing a custom non-relational backend, which are
often called \native graph databases". Although a relational
backend o ers numerous advantages such as transactions,
reliable storage and access control, native graph databases
attract customers with the promise of outperforming traditional
relational databases. Recent empirical studies [
        <xref ref-type="bibr" rid="ref4 ref9">4, 9</xref>
        ], however,
demonstrate that this promise is not always kept. Gubichev
and Then [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] show that several queries of their benchmark
timed out in Neo4j and DEX/Sparksee. While these studies
are a good starting point for comparing graph databases
with a relational and a custom non-relational backend, they
only focus on pattern matching queries and neglect analytical
queries such as shortest path and centrality measures.
      </p>
      <p>In this paper, we bridge this gap by de ning a set of
analytical queries (Section 2). In addition, compared to
related work, we introduce a more ne-grained categorization
of pattern matching queries including (i) paths that lter
on specifc node labels without restricting edge types, (ii)
paths that lter on speci c edge types without restricting
node labels, (iii) paths that lter on node labels and edge
types, and (iv) paths containing cycles. This
categorization provides the basis to systematically breaking down the
reasons why one system is outperforming another system.
Since Neo4j is the most widely used native graph database,
we compare the execution times of Cypher queries in Neo4j
to the execution times of corresponding SQL queries in a
market-leading relational database system (Section 3). We
show that the relational database system outperforms Neo4j
for our analytical queries. We argue that analytical queries,
which access most or all nodes of the graph, bene t from the
more advanced disk and bu er management of the relational
database system and that there is room for improvement
in this respect in Neo4j. In contrast, we demonstrate that
Neo4j is more e cient for path queries that do not lter on
speci c edge types. Section 4 summarizes related work and
Section 5 gives an outlook on future work.
2.</p>
    </sec>
    <sec id="sec-2">
      <title>QUERIES</title>
      <p>
        We begin by introducing the set of queries that we use to
compare Neo4j to a relational database system. We
categorize our queries in analytical and pattern matching queries.
Analytical queries process large parts of the graph at a time,
whereas pattern matching queries access only small parts
of the graph in most cases. In order to formally de ne our
queries, we rst have to introduce the property graph data
model on which Neo4j and its query language Cypher is
based [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <sec id="sec-2-1">
        <title>De nition 1. Assume a domain V of nodes, a domain E of</title>
        <p>directed edges, and a domain A of attributes. Additionally,
suppose a domain DA of atomic attribute values, a domain</p>
      </sec>
      <sec id="sec-2-2">
        <title>DL of node labels, and a domain DT of edge types. A property</title>
        <p>graph is a nite structure G = (V; E; A; ; ; ; ), where
V
E
A</p>
      </sec>
      <sec id="sec-2-3">
        <title>V is a nite set of nodes,</title>
      </sec>
      <sec id="sec-2-4">
        <title>E is a nite set of edges,</title>
      </sec>
      <sec id="sec-2-5">
        <title>A is a nite set of attributes,</title>
        <p>: (V [ E) A ! DA is a partial function assigning
values to attributes of nodes and edges,
: V ! P(DL) is a partial function assigning labels
to nodes, and</p>
        <p>: E ! DT is a partial function assigning a type to
edges.</p>
        <p>Besides the property graph data model, we also de ne the
notion of a path in a property graph.</p>
      </sec>
      <sec id="sec-2-6">
        <title>De nition 2. Suppose the nodes v1; vn 2 V . A path from</title>
        <p>v1 to vn, denoted as v1 ! vn, is a sequence of nodes and
edges hv1; e1; v2; : : : ; vn 1; en 1; vni, where 81 i n 1 :
(ei) = (vi; vi+1).
2.1</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Analytical Queries</title>
      <p>
        In this subsection, we formally de ne the set of analytical
queries that have been proposed in the graph benchmark of
Grossniklaus et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Since the focus of this paper is to
compare Neo4j with a relational database system, we only
use queries that can be expressed in Cypher. Therefore, we
cannot cover all analytical graph queries that are used in
real world applications. For each query of this subsection, we
also derive the corresponding Cypher and SQL statements
that are based on the dataset of the LUBM [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] benchmark
(cf. Section 3). The tables and views that are used in the
SQL statements are described in the Appendix. Some of the
following queries are limited in the number of result tuples
because Neo4j is not able to execute these queries without
restricting the result size.
2.1.1
      </p>
      <sec id="sec-3-1">
        <title>QA1: Node Degree</title>
        <sec id="sec-3-1-1">
          <title>Query QA1 returns for each node v 2 V its degree, which</title>
          <p>is the number of ingoing and outgoing edges and formally
de ned as follows:</p>
          <p>deg(v) := feje 2 E ^ (e) = (x; y) ^ (v = x _ v = y)g .
Cypher
MATCH (n)--(m)
RETURN n, COUNT(m)</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>Suppose P (u; v) is a set containing all paths u ! v, where</title>
          <p>u; v 2 V , and length(p) is a function returning the number of
edges of a path p. A shortest path between two nodes u and
v is a path p where 8p0 2 P (u; v) : length(p) length(p0).
The length of a shortest path between u and v is denoted as
(u; v). Query QA4 returns for all pairs of nodes u; v 2 V a
shortest path between u and v.</p>
          <p>Cypher
MATCH p = shortestPath((n)-[*]-&gt;(m))
RETURN n, m, p
SQL
SELECT *
FROM shortest path
2.1.5</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>QA5: Closeness Centrality</title>
        <sec id="sec-3-2-1">
          <title>Query QA5 returns for each node v 2 V its closeness</title>
          <p>centrality which is de ned as follows:</p>
          <p>CC (v) := Pu2V (v; u)</p>
          <p>.
1
Cypher
SQL
MATCH p = shortestPath((n)-[*]-(m))
WHERE n &lt;&gt; m WITH n, p LIMIT 60000000
RETURN n, 1.0/SUM(LENGTH(p)) AS centrality
SELECT TOP 60000000 sID AS node,</p>
          <p>1.0 / SUM(length) AS centrality
FROM (SELECT sID, dID, MIN(length) AS length
FROM shortest connection GROUP BY sID, dID
UNION ALL SELECT dID, sID, MIN(length) AS length
FROM shortest connection GROUP BY sID, dID) AS t
GROUP BY sID
2.1.6</p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>QA6: Diameter</title>
        <p>Query QA6 returns the diameter of the graph which is the
length of the longest shortest path between any two nodes
u; v 2 V :
d := max (u; v).</p>
        <p>u;v2V</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Pattern Matching Queries</title>
      <p>In the following, we de ne the types of pattern matching
queries used in our evaluation. We categorize our queries in
paths that lter on speci c node labels without restricting
edge types (QP1), paths that lter on speci c edge types
without restricting node labels (QP3), paths that lter on
node labels and edge types (QP4), and paths containing
cycles (QP2). This categorization provides a way to
determine how e cient Neo4j and the relational database system
can lter on speci c node labels or edge types. For each
pattern matching query type, we derive Cypher and SQL
statements. However, in this subsection we only present the
actual code of queries with patterns of length 1 (denoted as
QPx;1 in Section 3, where x is the query type). The queries
with longer patterns used in the evaluation are found in the
appendix.
2.2.1</p>
      <sec id="sec-4-1">
        <title>QP1: Paths filtering on node labels</title>
        <p>The rst type of pattern matching queries in our evaluation
are paths ltering on speci c node labels. This query type is
illustrated below, where l1; : : : ; ln 2 DL are labels.
l1
: : :
ln</p>
        <p>Note that we do not restrict the edge direction. Both
outgoing and ingoing edges are matched by the pattern
above.
: : :
ln
Cypher
SQL
MATCH (a:UndergraduateStudent)--(b:FullProfessor)
--(c:Department)--(a)
RETURN a,b,c
SELECT n1.id, n2.id, n3.id
FROM rel all bidirectional r1</p>
        <p>INNER JOIN node label n1 ON n1.id = r1.sID
INNER JOIN node label n2 ON n2.id = r1.dID
INNER JOIN rel all bidirectional r2 ON r2.sID = n2.id
INNER JOIN node label n3 ON n3.id = r2.dID</p>
        <p>INNER JOIN rel all bidirectional r3 ON n3.id = r3.sID
WHERE n1.value = 'UndergraduateStudent'</p>
        <p>AND n2.value = 'FullProfessor'
AND n3.value = 'Department'
AND r3.dID = n1.id
2.2.3</p>
      </sec>
      <sec id="sec-4-2">
        <title>QP3: Paths filtering on edge types</title>
        <p>Queries of type QP3 return paths with speci c edge types
as it is shown in the gure below, where t1; : : : ; tn 2 DT are
edge types.</p>
        <p>t2
: : :
t1
Cypher
SQL
MATCH (a)-[:teacherOf]-&gt;(b)&lt;-[:takesCourse]-(c)
RETURN a, b, c
SELECT n1.id AS teacher, n2.id AS course, n3.id AS student
FROM nodes n1, rel teacherOf rto, nodes n2,</p>
        <p>rel takesCourse rtc, nodes n3
WHERE n1.id = rto.sID AND rto.dID = n2.id</p>
        <p>AND n2.id = rtc.dID AND n3.id = rtc.sID</p>
      </sec>
      <sec id="sec-4-3">
        <title>2.2.4 QP4: Paths filtering on node labels and edge types</title>
        <p>Queries of type QP4 are a combination of queries of type
QP1 and of type QP3.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>3. EVALUATION</title>
      <p>Having presented the queries used in our performance
evaluation, we now present the experimental setup and the
obtained results.</p>
    </sec>
    <sec id="sec-6">
      <title>3.1 Experimental Setup</title>
      <p>
        All experiments presented in this paper were performed
on a Mac Pro with a 3.5 GHz 6-Core Intel Xeon E5
processor with 64 GB main memory. In our evaluation, we
compared Neo4j Version 3.0.3 with a market-leading
commercial database system. Due to license terms, the commercial
system is referred to as \RDBMS". Neo4j and RDBMS are
installed on a 64-bit Windows 8.1 virtual machine with 32
GB of main memory. Both database systems run in their
standard con guration. The data sets used in our
evaluation were generated by the data generator of the LUBM
benchmark [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. We generated a data set with one university
(LUBM-1) and a data set with two universities (LUBM-2).
To store the graph in RDBMS, we used the decomposed
storage model (DSM) described by Sakr et al. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. All runtime
measurements were repeated ve times in random order. The
reported averages discard the smallest and the largest value.
      </p>
      <p>Figure 1 shows the runtimes of our queries on the
LUBM1 data set, whereas Figure 2 shows the runtimes of our
queries on the LUBM-2 data set. In our rst experiments,
we have noticed that the performance of speci c query types
in RDBMS heavily depends on how e ciently edges can
be retrieved. Therefore, we created two di erent schemata
in RDBMS that are both based on the decomposed storage
model. Schema \A" has tables for each edge type. In addition,
a view is computed containing edges of all types. In contrast,
Schema \B" has a table containing edges of all types. In
addition, there are views for each edge type. In the following,
RDBMS with Schema A is denoted as \RDBMS A" and
RDBMS with Schema B is denoted as \RDBMS B". We
use the term \RDBMS" to refer to both RDBMS A and
RDBMS B.</p>
    </sec>
    <sec id="sec-7">
      <title>3.2 Results</title>
      <p>We rst present the results obtained on data set LUBM-1.
Figure 1(a) shows that for most of the analytical queries
0.72 0.19 0.06 0.38 0.08 0.06
674605
RDBMS is several orders of magnitude faster than Neo4j.
The largest di erence can be seen for query QA5 which
computes the closeness centrality where RDBMS B is over
1500 faster than Neo4j. The computation of the shortest
path is almost 10 faster in RDBMS B than in Neo4j. At a
rst glance, this is surprising since computing the shortest
path is a core functionality of Neo4j and is also provided
as a construct in the Cypher query language. Figure 1(b)
gives the runtimes of our pattern matching queries that
lter speci c node labels without restricting edge types. The
largest di erence in the runtime of Neo4j and RDBMS can be
observed for query QP1;4 which contains the largest number
of edges of all queries in (b). We can also see that the
queries QP1;1 and QP1;3 are faster in RDBMS B than in
Neo4j. The runtimes of cyclic pattern matching queries that
lter speci c node labels without restricting edge types are
shown in Figure 1(c). It can be seen that Neo4j outperforms
RDBMS. For all queries of Figure 1(c), Neo4j is almost
an order of magnitude faster than RDBMS. The runtimes
of pattern matching queries on ltering speci c edge types
without restricting node labels are given in Figure 1(d).
In contrast to the previous pattern matching query types,
RDBMS answers queries of this type more e ciently than
Neo4j. This is due to the fact that each relationship type has
its own table (or view) in RDBMS. Finally, Figure 1(e) plots
the runtimes of pattern matching queries that lter on node
labels and edge types. As before, RDBMS B outperforms
Neo4j and RDBMS A.</p>
      <p>Moving on to data set LUBM-2, Figure 2(a) shows the
runtime of our analytical queries. In Neo4j, only query QA1 and
QA2 could be executed in reasonable time. The other queries
were aborted after 24 hours or ran out of memory (denoted
by in Figure 2(a)). The runtimes of our pattern matching
queries that lter speci c node labels without restricting edge
types are given in Figure 2(b). For query QP1;3 RDBMS A
is faster than Neo4j, otherwise Neo4j outperforms RDBMS.
Figure 2(c) plots the runtimes of cyclic pattern matching
queries that lter speci c node labels without restricting
edge types. As for the results obtained on the LUBM-1 data
set (Figure 1(c)), Neo4j is consistently faster than RDBMS
for this type of pattern matching query. The runtimes of
pattern matching queries ltering speci c edge types without
restricting node labels that are plotted in Figure 2(d) show
that RDBMS consistently outperforms Neo4j for this type of
query. Finally, the runtimes of pattern matching queries that
lter on node labels and edge types are shown in Figure 2(e).
For these queries, RDBMS A consistently outperforms Neo4j.
3.3 Discussion and Limitations</p>
      <p>To conclude this section, we summarize our results and
discuss limitations of our evaluation. We have shown that
RDBMS outperforms Neo4j for our analytical queries. For
most of the analytical queries, RDBMS is several orders of
magnitude faster than Neo4j. A possible reason for this result
is the fact that the analytical queries have to access most or all
nodes of the graph. Therefore, they pro t from the advanced
disk and memory management of relational database systems.
We also showed that RDBMS A and RDBMS B both perform
badly for queries that need to join the whole edge table (or
view) multiple times for longer patterns. For queries with
cycles, the performance is even worse. However, if we only
query speci c edge types, RDBMS outperforms Neo4j as
can be seen in Figure 1(d) and Figure 2(d). Comparing
RDBMS A with RDBMS B on LUBM-1, we can observe that
for queries that lter on speci c edge types RDBMS B is more
e cient than RDBMS A. However, for queries with longer
patterns that do not lter on speci c edge types RDBMS A
outperforms RDBMS B. As for LUBM-1, there is again no
clear winner on LUBM-2.</p>
      <p>
        Since the work presented in this paper is only the rst
step towards a more extensive empirical study, it has some
shortcomings and open issues. First, we only use the
decomposed storage model [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] to map a property graph to
relations. Since the choice of the graph-to-relation mapping
strongly impacts the performance of queries, we need to
evaluate alternative mappings such as the hybrid schema
described by Sun et al. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Second, the performance of a
query also depends on the characteristics of the graph, e.g.,
the number of distinct labels or average node degree. Hence,
further data sets with di erent graph characteristics need to
be evaluated. Finally, pattern matching queries that lter
on speci c attribute values should be included as our current
queries lter on node labels and edge types only.
      </p>
    </sec>
    <sec id="sec-8">
      <title>RELATED WORK</title>
      <p>
        There are several empirical studies on the query
performance of graph database systems. To the best of our
knowledge, none of these studies cover analytical queries formulated
in a declarative graph query language. The work of
Vicknair et al. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] benchmarks Neo4j and MySQL. The authors
de ne three simple traversal queries and queries counting the
number of nodes with a speci c value. Ciglan et al. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]
implement graph traversal operations in Neo4j, DEX/Sparksee,
OrientDB, NativeSail, and SGDB, and compare their
performance. Grossniklaus et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] benchmark Neo4j and a
relational database system by implementing a set of
analytical queries using the Neo4j API and SQL. Welc et al. [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]
evaluate the performance of computing the shortest path in
Neo4j, Oracle and the Green-Marl language infrastructure.
Angles et al. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] benchmark graph databases in the context
of social networks. They de ne a set of pattern matching
and reachability queries that are often used to analyse social
networks. Jouili and Vansteenberghe [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] use traversal and
shortest path queries to benchmark Neo4j, DEX/Sparksee,
OrientDB and Titan. McColl et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] evaluate open-source
graph databases using implementations of the graph
algorithms shortest path, connected components and PageRank.
Pobiedina et al. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] evaluate pattern matching queries in
Neo4j (Cypher), PostgreSQL (SQL), Jena TDB (SPARQL)
and Clingo (ASP).
      </p>
      <p>CONCLUSION AND FUTURE WORK
In this paper, we de ned a set of analytical and pattern
matching queries in Cypher and SQL, and evaluated them in
Neo4j and a market-leading commercial relational database
system. We showed that the relational database system
outperforms Neo4j for our analytical queries and that Neo4j
is faster for queries that do not lter on speci c edge types.</p>
      <p>As future work we plan to build on this initial study
in order to better understand the implications of di erent
backends on graph query performance. First, we will study
the in uence of di erent graph-to-relation mappings on the
performance of relational backends. Additionally, we will
include further data sets with di erent graph characteristics
and pattern matching queries that lter on attribute values.</p>
    </sec>
    <sec id="sec-9">
      <title>Acknowledgement</title>
      <p>This work is supported by Grant No. GR 4497/2 of the
Deutsche Forschungsgemeinschaft (DFG).
6.</p>
    </sec>
    <sec id="sec-10">
      <title>APPENDIX</title>
    </sec>
    <sec id="sec-11">
      <title>A. TABLES AND VIEWS</title>
      <p>Below, we introduce the tables and views that were created
for Schema A and Schema B in order to store a property
graph in a relational database system. A node has a unique
id and is stored in the node_label relation with its
corresponding labels. Therefore, for each label of a node, there is
an entry in the node_label table. For each attribute, there is
a relation containing entries of the node id and the attribute
value. The node_label table and each attribute table have a
clustered index on the node id. In Schema A, there is a table
for each edge type containing the edge id and the ids of the
source and target nodes. The edge tables have a clustered
index on the id of the source and target node. In Schema B,
there is a table containing edges of all types. This table has
a clustered index on the id of the source and target node as
well as an unclustered index on the column indicating the
edge type. In addition, we create the following views, where
the pre x \rel " denotes an edge table (or view). However,
in Schema B we have a table rel_all instead of a view.
CREATE VIEW nodes (id) AS
SELECT DISTINCT id FROM node label
CREATE VIEW rel all (sID, dID) AS
SELECT sID, dID FROM rel advisor</p>
      <p>UNION ALL
SELECT sID, dID FROM rel doctoralDegreeFrom</p>
      <p>UNION AL
SELECT sID, dID FROM rel headOf</p>
      <p>UNION ALL
SELECT sID, dID FROM rel mastersDegreeFrom</p>
      <p>UNION ALL
SELECT sID, dID FROM rel memberOf</p>
      <p>UNION ALL
SELECT sID, dID FROM rel publicationAuthor</p>
      <p>UNION ALL
SELECT sID, dID FROM rel subOrganizationOf</p>
      <p>UNION ALL
SELECT sID, dID FROM rel takesCourse</p>
      <p>UNION ALL
SELECT sID, dID FROM rel teacherOf</p>
      <p>UNION ALL
SELECT sID, dID FROM rel teachingAssistantOf</p>
      <p>UNION ALL
SELECT sID, dID FROM rel undergraduateDegreeFrom</p>
      <p>UNION ALL
SELECT sID, dID FROM rel worksFor
CREATE VIEW rel all bidirectional (sID, dID) AS
SELECT sID, dID FROM rel all</p>
      <p>UNION ALL
SELECT dID AS sID, sID AS dID FROM rel all
CREATE VIEW transitive closure AS
WITH BacktraceCTE(sID, parent, dID, length) AS
(SELECT sID, NULL, dID, 1
FROM dbo.rel all
UNION ALL
SELECT b.sID, b.dID, r.dID, b.length + 1
FROM BacktraceCTE AS b</p>
      <p>INNER JOIN dbo.rel all AS r ON b.dID = r.sID)
SELECT sID, parent, dID, length FROM BacktraceCTE
CREATE VIEW shortest connection AS
SELECT DISTINCT t.* FROM dbo.transitive closure t
INNER JOIN (SELECT sID, dID, MIN(length) as minLength
FROM dbo.transitive closure GROUP BY sID, dID) m
ON m.sID = t.sID AND m.dID = t.dID AND m.minLength = t.length
CREATE VIEW shortest path AS WITH</p>
      <p>ShortestPathCTE (sID, dID, step from, step to) AS (
SELECT sID, dID, parent, dID FROM dbo.shortest connection
UNION ALL
SELECT t.sID, t.dID, coalesce(s.parent, t.sID) AS step from,
t.step from AS step to
FROM ShortestPathCTE t
INNER JOIN shortest connection s</p>
      <p>ON s.dID = t.step from AND s.sID = t.sID
) SELECT sID, dID,
coalesce(step from, sID) AS step from, step to</p>
      <p>FROM ShortestPathCTE
B. PATTERN MATCHING QUERIES
B.1</p>
    </sec>
    <sec id="sec-12">
      <title>Paths restricted on node labels</title>
      <sec id="sec-12-1">
        <title>QP1;2 Cypher QP1;2 SQL</title>
        <p>MATCH (a:University)--(b:FullProfessor)--(c:Department)
RETURN a, b, c
SELECT n1.id, n2.id, n3.id FROM rel all bidirectional r1
INNER JOIN node label n1 ON n1.id = r1.sID
INNER JOIN node label n2 ON n2.id = r1.dID
INNER JOIN rel all bidirectional r2 ON r2.sID = n2.id
INNER JOIN node label n3 ON n3.id = r2.dID
WHERE n1.value = 'University' AND n2.value = 'FullProfessor'</p>
        <p>AND n3.value = 'Department'</p>
      </sec>
      <sec id="sec-12-2">
        <title>QP1;3 Cypher</title>
        <p>MATCH (a:University)--(b:FullProfessor)--(c:Department)
--(d:UndergraduateStudent)
RETURN a, b, c, d
QP1;3 SQL
SELECT n1.id, n2.id, n3.id, n4.id FROM rel all bidirectional r1
INNER JOIN node label n1 ON n1.id = r1.sID
INNER JOIN node label n2 ON n2.id = r1.dID
INNER JOIN rel all bidirectional r2 ON r2.sID = n2.id
INNER JOIN node label n3 ON n3.id = r2.dID
INNER JOIN rel all bidirectional r3 ON n3.id = r3.sID
INNER JOIN node label n4 ON n4.id = r3.dID
WHERE n1.value = 'University' AND n2.value = 'FullProfessor'
AND n3.value = 'Department'</p>
        <p>AND n4.value = 'UndergraduateStudent'</p>
      </sec>
      <sec id="sec-12-3">
        <title>QP1;4 Cypher</title>
        <p>MATCH (a:University)--(b:FullProfessor)--(c:Department)
--(d:UndergraduateStudent)--(e:Course)
RETURN a, b, c, d, e
QP1;4 SQL
SELECT n1.id, n2.id, n3.id, n4.id, n5.id
FROM rel all bidirectional r1</p>
        <p>INNER JOIN node label n1 ON n1.id = r1.sID
INNER JOIN node label n2 ON n2.id = r1.dID
INNER JOIN rel all bidirectional r2 ON r2.sID = n2.id
INNER JOIN node label n3 ON n3.id = r2.dID
INNER JOIN rel all bidirectional r3 ON n3.id = r3.sID
INNER JOIN node label n4 ON n4.id = r3.dID
INNER JOIN rel all bidirectional r4 ON n4.id = r4.sID
INNER JOIN node label n5 ON n5.id = r4.dID
WHERE n1.value = 'University' AND n2.value = 'FullProfessor'
AND n3.value = 'Department'</p>
        <p>AND n4.value = 'UndergraduateStudent' AND n5.value = 'Course'
B.2</p>
      </sec>
    </sec>
    <sec id="sec-13">
      <title>Cycles</title>
      <sec id="sec-13-1">
        <title>QP2;2 Cypher</title>
        <p>MATCH (a:UndergraduateStudent)--(b:AssociateProfessor)
--(c:GraduateStudent)--(d:Department)--(a)
RETURN a,b,c,d
QP2;2 SQL
SELECT n1.id, n2.id, n3.id, n4.id FROM rel all bidirectional r1
INNER JOIN node label n1 ON n1.id = r1.sID
INNER JOIN node label n2 ON n2.id = r1.dID
INNER JOIN rel all bidirectional r2 ON r2.sID = n2.id
INNER JOIN node label n3 ON n3.id = r2.dID
INNER JOIN rel all bidirectional r3 ON n3.id = r3.sID
INNER JOIN node label n4 ON n4.id = r3.dID</p>
        <p>INNER JOIN rel all bidirectional r4 ON n4.id = r4.sID
WHERE n1.value = 'UndergraduateStudent'</p>
        <p>AND n2.value = 'AssociateProfessor'
AND n3.value = 'GraduateStudent'</p>
        <p>AND n4.value = 'Department' AND r4.dID = n1.id</p>
      </sec>
      <sec id="sec-13-2">
        <title>QP2;3 Cypher</title>
        <p>MATCH (a:UndergraduateStudent)--(b:Course)
--(c:AssociateProfessor)--(d:UndergraduateStudent)
--(e:Course)--(a) WHERE b &lt;&gt; e AND a&lt;&gt;d
RETURN a,b,c,d,e
QP2;3 SQL
SELECT n1.id, n2.id, n3.id, n4.id, n5.id
FROM rel all bidirectional r1</p>
        <p>INNER JOIN node label n1 ON n1.id = r1.sID
INNER JOIN node label n2 ON n2.id = r1.dID
INNER JOIN rel all bidirectional r2 ON r2.sID = n2.id
INNER JOIN node label n3 ON n3.id = r2.dID
INNER JOIN rel all bidirectional r3 ON n3.id = r3.sID
INNER JOIN node label n4 ON n4.id = r3.dID
INNER JOIN rel all bidirectional r4 ON n4.id = r4.sID
INNER JOIN node label n5 ON n5.id = r4.dID</p>
        <p>INNER JOIN rel all bidirectional r5 ON n5.id = r5.sID
WHERE n1.value = 'UndergraduateStudent' AND n2.value = 'Course'
AND n3.value = 'AssociateProfessor'
AND n4.value = 'UndergraduateStudent' AND n5.value = 'Course'
AND r5.dID = n1.id AND n1.id != n4.id AND n2.id != n5.id</p>
      </sec>
      <sec id="sec-13-3">
        <title>QP2;4 Cypher</title>
        <p>MATCH (a:UndergraduateStudent)--(b:Course)
--(c:AssociateProfessor)--(d:UndergraduateStudent)
--(e:Course) --(f:AssistantProfessor)--(a)
WHERE b &lt;&gt; e AND a&lt;&gt;d
RETURN a,b,c,d,e,f
QP2;4 SQL
SELECT n1.id, n2.id, n3.id, n4.id, n5.id, n6.id
FROM rel all bidirectional r1</p>
        <p>INNER JOIN node label n1 ON n1.id = r1.sID
INNER JOIN node label n2 ON n2.id = r1.dID
INNER JOIN rel all bidirectional r2 ON r2.sID = n2.id
INNER JOIN node label n3 ON n3.id = r2.dID
INNER JOIN rel all bidirectional r3 ON n3.id = r3.sID
INNER JOIN node label n4 ON n4.id = r3.dID
INNER JOIN rel all bidirectional r4 ON n4.id = r4.sID
INNER JOIN node label n5 ON n5.id = r4.dID
INNER JOIN rel all bidirectional r5 ON n5.id = r5.sID
INNER JOIN node label n6 ON n6.id = r5.dID</p>
        <p>INNER JOIN rel all bidirectional r6 ON n6.id = r6.sID
WHERE n1.value = 'UndergraduateStudent'</p>
        <p>AND n2.value = 'Course' AND n3.value = 'AssociateProfessor'
AND n4.value = 'UndergraduateStudent' AND n5.value = 'Course'
AND n6.value = 'AssistantProfessor' AND r6.dID = n1.id
AND n1.id != n4.id AND n2.id != n5.id
B.3 Paths restricted on edge types</p>
      </sec>
      <sec id="sec-13-4">
        <title>QP3;2 Cypher</title>
        <p>MATCH (a)-[:teacherOf]-&gt;(b)&lt;-[:takesCourse]-(c)</p>
        <p>-[:advisor]-&gt;(d)
RETURN a, b, c, d
QP3;2 SQL
SELECT n1.id AS teacher, n2.id AS course, n3.id AS student,
n4.id AS advisor
FROM nodes n1, rel teacherOf rto, nodes n2,</p>
        <p>rel takesCourse rtc, nodes n3, rel advisor ra, nodes n4
WHERE n1.id = rto.sID AND rto.dID = n2.id</p>
        <p>AND n2.id = rtc.dID AND n3.id = rtc.sID
AND n3.id = ra.sID AND n4.id = ra.dID
MATCH (a)-[:teacherOf]-&gt;(b)&lt;-[:takesCourse]-(c)</p>
        <p>-[:advisor]-&gt;(d)-[:mastersDegreeFrom]-&gt;(e)
RETURN a, b, c, d, e
QP3;3 SQL
SELECT n1.id AS teacher, n2.id AS course, n3.id AS student,
n4.id AS advisor
FROM nodes n1, rel teacherOf rto, nodes n2, rel takesCourse rtc,
nodes n3, rel advisor ra, nodes n4,
rel mastersDegreeFrom rmdf, nodes n5
WHERE n1.id = rto.sID AND rto.dID = n2.id</p>
        <p>AND n2.id = rtc.dID AND n3.id = rtc.sID
AND n3.id = ra.sID AND n4.id = ra.dID</p>
        <p>AND n4.id = rmdf.sID AND n5.id = rmdf.dID</p>
      </sec>
      <sec id="sec-13-5">
        <title>QP3;4 Cypher</title>
        <p>MATCH (a)-[:teacherOf]-&gt;(b)&lt;-[:takesCourse]-(c)
-[:advisor]-&gt;(d)-[:mastersDegreeFrom]-&gt;(e)
&lt;-[:doctoralDegreeFrom]-(f)
RETURN a, b, c, d, e, f
QP3;4 SQL
SELECT n1.id AS teacher, n2.id AS course, n3.id AS student,
n4.id AS advisor
FROM nodes n1, rel teacherOf rto, nodes n2, rel takesCourse rtc,
nodes n3, rel advisor ra, nodes n4, rel mastersDegreeFrom rmdf,
nodes n5, rel doctoralDegreeFrom rddf, nodes n6
WHERE n1.id = rto.sID AND rto.dID = n2.id</p>
        <p>AND n2.id = rtc.dID AND n3.id = rtc.sID
AND n3.id = ra.sID AND n4.id = ra.dID
AND n4.id = rmdf.sID AND n5.id = rmdf.dID</p>
        <p>AND n5.id = rddf.dID AND n6.id = rddf.sID
B.4 Paths restricted on node labels and edge
types</p>
      </sec>
      <sec id="sec-13-6">
        <title>QP4;2 Cypher</title>
        <p>MATCH (a:FullProfessor)-[:teacherOf]-&gt;(b:Course)
&lt;-[:takesCourse]-(c:UndergraduateStudent)
-[:advisor]-&gt;(d:AssociateProfessor)
RETURN a, b, c, d
QP4;2 SQL
SELECT a.id, b.id, c.id, d.id
FROM node label a, rel teacherOf r1, node label b,
rel takesCourse r2, node label c, rel advisor r3, node label d
WHERE a.value = 'FullProfessor' AND b.value = 'Course'
AND c.value = 'UndergraduateStudent'
AND d.value = 'AssociateProfessor'
AND a.id = r1.sID AND r1.dID = b.id AND b.id = r2.dID
AND r2.sID = c.id AND c.id = r3.sID AND r3.dID = d.id</p>
      </sec>
      <sec id="sec-13-7">
        <title>QP4;3 Cypher</title>
        <p>MATCH (a:FullProfessor)-[:teacherOf]-&gt;(b:Course)
&lt;-[:takesCourse]-(c:UndergraduateStudent)
-[:advisor]-&gt;(d:AssociateProfessor)
-[:mastersDegreeFrom]-&gt;(e:University)
RETURN a, b, c, d, e
QP4;3 SQL
SELECT a.id, b.id, c.id, d.id, e.id
FROM node label a, rel teacherOf r1, node label b,
rel takesCourse r2, node label c, rel advisor r3, node label d,
rel mastersDegreeFrom r4, node label e
WHERE a.value = 'FullProfessor' AND b.value = 'Course'
AND c.value = 'UndergraduateStudent'
AND d.value = 'AssociateProfessor' AND e.value = 'University'
AND a.id = r1.sID AND r1.dID = b.id AND b.id = r2.dID
AND r2.sID = c.id AND c.id = r3.sID AND r3.dID = d.id
AND d.id = r4.sID AND r4.dID = e.id</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R.</given-names>
            <surname>Angles</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Prat-Perez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dominguez-Sal</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>J.-L.</given-names>
            <surname>Larriba-Pey</surname>
          </string-name>
          .
          <article-title>Benchmarking Database Systems for Social Network Applications</article-title>
          .
          <source>In Proc. Workshop on Graph Data Management Experiences and Systems (GRADES)</source>
          , pages
          <fpage>1</fpage>
          <issue>{7</issue>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Ciglan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Averbuch</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Hluchy</surname>
          </string-name>
          .
          <article-title>Benchmarking Traversal Operations over Graph Databases</article-title>
          .
          <source>In Proc. ICDE Workshops (ICDEW)</source>
          , pages
          <fpage>186</fpage>
          {
          <fpage>189</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Grossniklaus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Leone</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Za</surname>
          </string-name>
          <article-title>schke. Towards a Benchmark for Graph Data Management and Processing</article-title>
          .
          <source>Technical report</source>
          , University of Konstanz,
          <year>2013</year>
          . http://nbn-resolving.de/urn:nbn:de:bsz:
          <fpage>352</fpage>
          -
          <lpage>242536</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Gubichev</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Then</surname>
          </string-name>
          . Graph Pattern Matching:
          <article-title>Do We Have to Reinvent the Wheel?</article-title>
          <source>In Proc. Workshop on Graph Data Management Experiences and Systems (GRADES)</source>
          , pages
          <fpage>1</fpage>
          <issue>{7</issue>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Pan</surname>
          </string-name>
          , and
          <string-name>
            <surname>J.</surname>
          </string-name>
          <article-title>He in. LUBM: A Benchmark for OWL Knowledge Base Systems</article-title>
          . J. Web Sem.,
          <volume>3</volume>
          (
          <issue>2</issue>
          -3):
          <volume>158</volume>
          {
          <fpage>182</fpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.</given-names>
            <surname>Ho</surname>
          </string-name>
          <article-title>lsch and M. Grossniklaus. An Algebra and Equivalences to Transform Graph Patterns in Neo4j</article-title>
          .
          <source>In Proc. Intl. Workshop on Querying Graph Structured Data (GraphQ)</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Jouili</surname>
          </string-name>
          and
          <string-name>
            <given-names>V.</given-names>
            <surname>Vansteenberghe</surname>
          </string-name>
          .
          <article-title>An Empirical Comparison of Graph Databases</article-title>
          .
          <source>In Proc. Intl. Conf. on Social Computing (SOCIALCOM)</source>
          , pages
          <fpage>708</fpage>
          {
          <fpage>715</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>R.</given-names>
            <surname>McColl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ediger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Poovey</surname>
          </string-name>
          , D. Campbell, and
          <string-name>
            <surname>D. A. Bader.</surname>
          </string-name>
          <article-title>A Performance Evaluation of Open Source Graph Databases</article-title>
          .
          <source>In Proc. Workshop on Parallel Programming for Analytics Applications (PPAA)</source>
          , pages
          <fpage>11</fpage>
          {
          <fpage>18</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>N.</given-names>
            <surname>Pobiedina</surname>
          </string-name>
          , S. Rummele, S. Skritek, and
          <string-name>
            <given-names>H.</given-names>
            <surname>Werthner</surname>
          </string-name>
          .
          <article-title>Benchmarking Database Systems for Graph Pattern Matching</article-title>
          .
          <source>In Proc. Intl. Conf. on Database and Expert Systems Applications (DEXA)</source>
          , pages
          <fpage>226</fpage>
          {
          <fpage>241</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S.</given-names>
            <surname>Sakr</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Elnikety</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Y.</given-names>
            <surname>He. G-SPARQL:</surname>
          </string-name>
          <article-title>A Hybrid Engine for Querying Large Attributed Graphs</article-title>
          .
          <source>In Proc. Intl. Conf. on Information and Knowledge Management (CIKM)</source>
          , pages
          <fpage>335</fpage>
          {
          <fpage>344</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>W.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Fokoue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Srinivas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kementsietsidis</surname>
          </string-name>
          , G. Hu, and
          <string-name>
            <given-names>G.</given-names>
            <surname>Xie. SQLGraph</surname>
          </string-name>
          :
          <article-title>An E cient Relational-Based Property Graph Store</article-title>
          .
          <source>In Proc. Intl. Conf. on Management of Data (SIGMOD)</source>
          , pages
          <year>1887</year>
          {
          <year>1901</year>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>C.</given-names>
            <surname>Vicknair</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Macias</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Nan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chen</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Wilkins</surname>
          </string-name>
          .
          <article-title>A Comparison of a Graph Database and a Relational Database: A Data Provenance Perspective</article-title>
          .
          <source>In Proc. ACM Southeast Regional Conference, page 42</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>A.</given-names>
            <surname>Welc</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Raman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Cha</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Banerjee</surname>
          </string-name>
          .
          <article-title>Graph Analysis: Do We Have to Reinvent the Wheel?</article-title>
          <source>In Proc. Workshop on Graph Data Management Experiences and Systems (GRADES)</source>
          , pages
          <fpage>1</fpage>
          <issue>{6</issue>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>