<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>An Algebra and Equivalences to Transform Graph Patterns in Neo4j</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jürgen Hölsch</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michael Grossniklaus</string-name>
          <email>michael.grossniklaus@uni-konstanz.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer and Information Science, University of Konstanz P.</institution>
          <addr-line>O. Box 188, 78457 Konstanz</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <fpage>3</fpage>
      <lpage>10</lpage>
      <abstract>
        <p>Modern query optimizers of relational database systems embody more than three decades of research and practice in the area of data management and processing. Key advances include algebraic query transformation, intelligent search space pruning, and modular optimizer architectures. Surprisingly, many of these contributions seem to have been overlooked in the emerging eld of graph databases so far. In particular, we believe that query optimization based on a general graph algebra and its equivalences can greatly improve on the current state of the art. Although some graph algebras have already been proposed, they have often been developed in a context, in which a relational database system is used as a backend to process graph data. As a consequence, these algebras are typically tightly coupled to the relational algebra, making them unsuitable for native graph databases. While we support the approach of extending the relational algebra, we argue that graph-speci c operations should be de ned at a higher level, independent of the database backend. In this paper, we introduce such a general graph algebra and corresponding equivalences. We demonstrate how it can be used to optimize Cypher queries in the setting of the Neo4j native graph database.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>
        The management and processing of graph data is gaining
importance in several application domains such as social
network analysis [
        <xref ref-type="bibr" rid="ref11 ref18 ref3">3, 11, 18</xref>
        ], the Semantic Web [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], and biological
networks [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. As a consequence, numerous graph database
systems such as Neo4j, DEX/Sparksee, and OrientDB have
recently emerged. With respect to their architecture, graph
databases can be classi ed into two groups. Systems
belonging to the rst group map graphs to existing relational or
non-relational database backends, wheareas the native graph
database systems of the second group implement custom
backends to store the data. While this design choice provides
the freedom to address requirements that are speci c to graph
data by tailor-made solutions, it also poses the challenge of
applying the lessons learnt in over 30 years of designing and
developing relational database systems to these new graph
database systems.
      </p>
      <p>In this paper, we focus on query optimization in the
setting of the Neo4j native graph database. We have chosen
this concrete setting for several reasons. First, native graph
databases are still very heterogeneous in terms of
functionality and architecture. Faced with this situation, we follow
a \bottom-up" approach that addresses the problems of one
system and is later generalized to other systems. Second,
Neo4j supports Cypher, a mature declarative graph query
language that opens up similar optimization opportunities as
SQL in relational database systems. Finally, Neo4j's current
query processor provides ample opportunity for improvement,
as we demonstrate in the following motivating example.</p>
      <p>Consider the following Cypher query on a movie database1
provided by Neo4j, which we use for examples throughout
this paper. The query returns all movies in which \Al Pacino"
and \Robert De Niro" act in.</p>
      <sec id="sec-1-1">
        <title>MATCH (x:Actor)- -&gt;(y)&lt;- -(z:Actor) WHERE x.name = \Al Pacino" AND z.name = \Robert De Niro"</title>
        <p>RETURN *
In the Neo4j Community Edition Version 2.3.1, this query
is evaluated using the following query plan. As in relational
database systems, the evaluation order is from the leaves to
the root.</p>
        <sec id="sec-1-1-1">
          <title>Expand(All)[y&lt;- -x]</title>
        </sec>
        <sec id="sec-1-1-2">
          <title>Expand(All)[z- -&gt;y]</title>
          <p>First, an index is accessed to determine the actor node
corresponding to \Robert De Niro". Next, the Expand(All)
operator is applied to get the movies in which Robert De Niro
acts in. The Expand(All) operator returns all neighbors of
a given node. For the movie nodes retrieved in this way, the
graph is further traversed to nd the actors that starred in
the same movies as Robert De Niro. Finally, the actor node
corresponding to \Al Pacino" is selected. This execution
plan results in 1855 database accesses. In Neo4j, a database
access is called \database hit". In this work, we will also use
the term database hit to refer to a database access and use
it as our primary metric to quantify the cost of a plan.
1http://neo4j.com/developer/example-data/ (November
19, 2015)</p>
          <p>However, the execution plan that Neo4j produces for this
query is not the most e cient one. In the following, we
extend the query with hints that tell Neo4j to use indexes.</p>
        </sec>
      </sec>
      <sec id="sec-1-2">
        <title>MATCH (x:Actor)- -&gt;(y)&lt;- -(z:Actor) USING INDEX x:Actor(name) USING INDEX z:Actor(name) WHERE x.name = \Al Pacino" AND z.name = \Robert De Niro"</title>
        <p>RETURN *
With the help of these hints, Neo4j's optimizer returns a
di erent execution plan, which is given below.</p>
        <p>NodeHashJoin
Both the \Al Pacino" and \Robert De Niro" actor node are
now determined by an index access. Then, the neighboring
movie nodes are retrieved for these two actor nodes. The
result of these operations are two sets of subgraphs, in which
each subgraph consists of exactly one edge. In the last step,
the two subgraph sets are joined by matching the movie
nodes. Compared to the rst plan, this second plan is 95%
less expensive as it only requires 90 database hits.</p>
        <p>There are two obstacles that could prevent Neo4j from
choosing the more e cient plan given the original query.
First, Neo4j might not consider the second plan because it is
unable to enumerate it. Second, even if it nds the second
plan, Neo4j might not recognize it as less expensive because
it lacks the cost model to accurately estimate the number
of required database hits. In this work, we address the
rst of these two obstacles by outlining how the well-known
technique of algebraic query optimization can be applied to
Cypher graph patterns in Neo4j.</p>
        <p>We are aware of existing proposals to de ne a graph algebra
and corresponding equivalences in order to build graph query
optimizers following the blueprint established by relational
database systems. However, most of these algebras have
been de ned for graph database systems that use a relational
database system as a backend. Therefore, they typically make
assumptions about the relational representation of graphs or
are tightly coupled to the operators of the relational algebra.
While we believe in the general approach of extending the
relational algebra to support graph data processing, we argue
these extensions need to be de ned at a higher level in order
to support native graph databases such as Neo4j. The speci c
contributions of this paper are as follows.</p>
        <p>A data model that uses property graphs to represent
the graph database, while so-called graph relations
model the input and output of operators (Section 2).
An algebra that de nes two new high-level operators
for representing graph patterns in Cypher (Section 3).
Equivalence rules together with proofs of correctness
that specify how expressions of our algebra can be
transformed (Section 4).</p>
        <p>New algebraic optimization techniques for Cypher queries
at the logical level (Section 5).
2
3
3</p>
      </sec>
      <sec id="sec-1-3">
        <title>Label</title>
        <p>Movie
Actor</p>
        <p>Actor
Director
Movie
Actor</p>
      </sec>
      <sec id="sec-1-4">
        <title>Name</title>
        <p>Al Pacino
Robert De Niro
Michael Mann
Russel Crowe</p>
      </sec>
      <sec id="sec-1-5">
        <title>Title</title>
        <p>Heat
The Insider</p>
        <p>We position our work w.r.t. related approaches in Section 6.
Section 7 gives concluding remarks. Since the results
presented in this paper are the rst step of a larger research
e ort, we also outline future work in that section.
2.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>DATA MODEL</title>
      <p>Before de ning our graph algebra, we rst introduce the data
model on which it is based. Existing graph data models can
broadly be categorized into two classes, i.e., data models
based on property graphs and the RDF data model. Whereas
property graph data models can assign properties to nodes
and edges, properties in RDF are represented as additional
nodes. Most existing graph algebra proposals have been
de ned in the context of RDF. Since RDF triples can easily
be mapped to relations, these algebras are often only slight
extensions of the relational algebra. Unfortunately, these
algebras are typically unsuited to express queries on property
graphs. In this work, we will therefore focus on the property
graph model, which is also used by Neo4j. More speci cally,
we limit this work to directed graphs in which nodes have
only one label.</p>
      <p>De nition 2.1 (Property Graph) Let G = (V; E; v; e;
Av; Ae; ; lv; le) be a property graph, where V is a set of nodes,
E is a set of edges, v is a set of node labels, and e is a
set of edge labels. In addition, Av is a set of node properties
and Ae is a set of edge properties. Let D be a set of atomic
domains. A property ai 2 Av is a function ai : V ! Di [ f g
which assigns a property value from a domain Di 2 D to
a node v 2 V , if v has property ai, otherwise ai(v) returns
. A property aj 2 Ae is a function aj : E ! Dj [ f g
which assigns a property value from a domain Dj 2 D to an
edge e 2 E, if e has property aj, otherwise aj(e) returns .
Furthermore, : E ! V V is a function assigning nodes
to edges, lv : V ! v is a function assigning labels to nodes,
and le : E ! e is a function assigning labels to edges.</p>
      <p>As we will explain in Section 3, the input and output of
our algebra operators are subgraphs. Therefore, we need to
de ne how a subgraph G0 of G is represented. Since we want
to combine our new operators with the existing operators of
the relational algebra, we represent subgraphs as relations.
For this purpose, we introduce the concept of graph relations.
A graph relation is a relation that only contains columns
corresponding to nodes or edges. Example 2.1 illustrates the
concept of graph relations.</p>
      <p>Example 2.1 The Cypher query below returns the movies
directed by Michael Mann in which Al Pacino acted.</p>
      <sec id="sec-2-1">
        <title>MATCH (x:Actor)- -&gt;(y)&lt;- -(z:Director) WHERE x.name = \Al Pacino" AND z.name = \Michael Mann"</title>
        <p>RETURN *
The following two subgraphs of the graph shown in Figure 1
match the pattern de ned in the Cypher query above. In
order to illustrate the matching, the variables x,y and z are
included in the subgraphs.</p>
        <p>x
2
x
2
1
5
xy
1
5
These subgraphs are represented by the following graph
relation.</p>
        <p>The column names in the schema of a graph relation are used
to access the nodes and edges of a subgraph (cf. Section 3).
The column \xy" represents the edge from node x to y and
the column \zy" represents the edge from node z to y.
De nition 2.2 (Graph Relation) Given a property graph
G, a relation R is a graph relation, if the following holds:
8A 2 attr(R) : dom(A) = V _ E,
where attr(R) is the set of attributes of R, dom(A) is the
domain of attribute A, V are nodes of G, and E are edges
of G.</p>
        <p>The attributes of a graph relation only contain node or edge
identi ers that reference nodes or edges of the graph G.</p>
        <p>Using graph relations as input and output of our algebra
operators has two advantages. First, as mentioned before, the
operators of the relational algebra (e.g., selection and
projection) can be reused. Second, graph relations are independent
of the underlying graph representations. For example,
instead of basing them on property graphs, they can also be
de ned in the scope of the RDF data model in order to
represent SPARQL queries in our algebra.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>ALGEBRA</title>
      <p>Having described the graph data model that we assume, we
now extend the relational algebra with two new graph-speci c
operators. First, we introduce the GetNodes operator which
returns a graph relation containing all nodes of the
underlying graph G. Therefore, leaves of an operator tree of our
algebra have to be GetNodes operators. Second, we
introduce the Expand operator which adds two new columns to
the current graph relation where one column contains all
neighbors of a speci c node and the other column contains
the corresponding edges. The Expand operator makes it
possible to formally describe the order in which nodes are
traversed to nd a speci c pattern.</p>
      <p>Before we de ne the GetNodes and Expand operator,
we have to clarify how property and label values of nodes
and edges are accessed in the selection condition of a
selection operator. As described in the previous section, graph
relations only contain node and edge identi er, whereas the
property and label data is stored as a property graph.
Assuming that x is a column of a graph relation, we use the
notation \x:a" in selection conditions to express the access to
the corresponding value of property a in the property graph.
If x represents a node, then we can access the label of x with
the term \x:lv" (or x:le, if x is an edge).</p>
      <p>We now introduce the GetNodes operator, which is
denoted by x, where x is the name of the newly created
column. Before de ning the operator, Example 3.1
demonstrates the operation of GetNodes.</p>
      <p>Example 3.1 Consider the graph G from Figure 1. The
expression x returns the graph relation below (left) containing
the IDs of all nodes of G. The expression x:name=\Al Pacino"(
x) returns the following table on the right.
Note that val(R) is the set of tuples of R and sch(R) is the
schema of R.</p>
      <p>Next, we de ne the Expand operator. The Expand
operator is used to return the neighbors of a given node. Therefore,
a sequence of Expand operators speci es a search order in
which a graph is traversed to nd a speci c pattern.
Example 3.2 illustrates the idea behind this operator.</p>
      <p>Example 3.2 Consider graph G of Figure 1. Assume we
want to nd all subgraphs in G that match the pattern de ned
in Example 2.1. First, the expression below returns the actor
node with the name \Al Pacino".</p>
      <p>x:lv=\Actor"^x:name=\Al Pacino"( x)
Then, we need to nd the neighbors of x that are connected
by an outgoing edge. Therefore, we use the ExpandOut
y
operator denoted by "x, where y is the new column that is
added to the current graph relation containing the neighbors
of nodes from x.</p>
      <p>"yx ( x:lv=\Actor"^x:name=\Al Pacino"( x))
The expression above returns the following graph relation, in
which column xy contains the edges from x to y.
As a next step, we need to retrieve the director nodes that
are connected to nodes contained in y. In order to do so,
we again use the Expand operator. However, we now need
to return the nodes that are connected by incoming edges to
nodes of y. Hence, we use the ExpandIn operator denoted
z
by #y.
The expression above returns the following graph relation, in
which zy represents edges from z to y.
As shown in this example, the Expand operator has two
directions: ExpandOut is used for outgoing edges (denoted
by "), whereas ExpandIn is used for ingoing edges (denoted
by #). The variable given at the bottom of the ExpandOut
or ExpandIn operator references an existing column of the
graph relation contained in the input. The variable given
at the top of the ExpandOut or ExpandIn operator is the
name of the newly created column.</p>
      <p>In the last step of the example query, we select the subgraphs
that contain the director\Michael Mann".</p>
      <p>z:lv=\Director"^z:name=\Michael Mann"(</p>
      <p>#zy ("yx ( x:lv=\Actor"^x:name=\Al Pacino"( x))))
As a result, we get the following graph relation, which is equal
to the graph relation obtained in Example 2.1.</p>
      <p>De nition 3.2 (Expand Operator) Let R be a graph
relation and x 2 attr(R) an attribute. The ExpandOut
operator "yx (R) adds a new column y to R containing the nodes
that can be reached by an outgoing edge from nodes of x. In
addition, a column xy is added to R containing the
corresponding edges from nodes of x to nodes of y. The symbol k
denotes the concatenation of schema tuples.
val("yx (R)) = f ht; e; vi j t 2 val(R)^e 2 E ^ (e) = (t:x; v)g
sch("yx (R)) = sch(R) k hxy; yi
The ExpandIn operator #yx (R) adds a new column y to R
containing the nodes that can be reached by an ingoing edge
xy
1
5
zy
3
4
from nodes of x. In addition, a column yx is added to R
containing the corresponding edges from nodes of y to nodes
of x.
val(#yx (R)) = f ht; e; vi j t 2 val(R)^e 2 E ^ (e) = (v; t:x)g
sch(#yx (R)) = sch(R) k hyx; yi
The column name for edges is a concatenation of the column
names of the corresponding nodes. The column name of an
edge also indicates the direction of an edge. For example,
\xy" is an edge from x to y and \yx" is an edge from y to x.</p>
      <p>The components of a schema tuple s 2 sch(R) and a data
tuple t 2 val(R) are ordered. However, in this work, we say
that two schemata are equal if they have the same set of
attributes. Therefore, two tuples are equal if they have the
same values in each of their attributes.</p>
      <p>In Section 5, we demonstrate how the Expand operator
can be directly mapped to the physical Expand(All) operator
of Neo4j. Despite this direct correspondence, Expand is a
logical operator since in a di erent storage backend it will
be evaluated by another operator, for example in the case of
a relational database system, as a join of edge tables.
4.</p>
    </sec>
    <sec id="sec-4">
      <title>EQUIVALENCE RULES</title>
      <p>Based on the GetNodes and Expand operator that we
introduced in the previous section, we de ne and prove the
equivalence rules of our algebra. In this section, we only
describe equivalence rules that are used in the optimizations
in Section 5, which serve as a proof-of-concept for our
approach. The de nition of a complete set of equivalence rules
is subject to future work.</p>
      <p>Assume that we want to nd all patterns, where a node
(denoted with x) is connected by an outgoing edge to another
node (denoted by y). This pattern can be traversed either
by starting at x nodes and getting the outgoing edges or by
starting at y nodes and getting the incoming edges. Rule 4.1
states this simple fact.</p>
      <p>"yx ( x)
#yx ( y)
(4.1)
Proof. Both expressions have the same set of attributes:
attr("yx ( x)) = attr(#yx ( y)) = fx; xy; yg. In addition,
we have to show that val("yx ( x)) val(#yx ( y) and
val(#yx ( y)) val("yx ( x)) holds. Let t 2 val("yx ( x))
be a tuple, then there is an edge e with (e) = (t:x; t:y). As
a consequence, t 2 val(#yx ( y)) holds. Therefore, val("yx
( x)) val(#yx ( y) holds. val(#yx ( y)) val("yx ( x))
can shown analogously and is thus omitted.</p>
      <p>Assume that we have the pattern as above, but that we
additionally want to nd nodes (denoted by z) that are
connected by outgoing edges starting from y nodes. The
following graph shows such a pattern.</p>
      <p>x
y
z
Rule 4.2 states that this pattern can be traversed by starting
at z nodes and determining the incoming edges.
"zy ("yx ( x))
#yx (#zy ( z))
(4.2)
Proof. Again, we have to show that both expressions return
the same tuples. Consider a tuple t 2 val("zy ("yx ( x))).
Then, there is an edge e with (e) = (t:y; t:z). From this
it follows that t[y; z] 2 val(#zy ( z)), where t[y; z] is the
projection of t on the attributes y and z. Since t 2 val("zy
("yx ( x))) holds, it also follows that there is an edge e0
with (e0) = (t:x; t:y). Therefore, t 2 val("zy ("yx ( x)))
holds. val(#yx (#zy ( z))) val("zy ("yx ( x))) is shown
analogously.</p>
      <p>Furthermore, consider the following pattern in which y
nodes have ingoing edges from x and z nodes.</p>
      <p>x
y
z
Rule 4.3 states that this pattern can also be traversed by
starting at z nodes.</p>
      <p>#zy (#yx ( y))
#yx ("zy ( z))
(4.3)
As the structure of the proof for Rule 4.3 is very similar to
the proof for Rule 4.2, we do not include a proof for Rule 4.3
at this point.</p>
      <p>The Rules 4.1, 4.2, and 4.3 describe the basic cases how a
pattern can be traversed. For patterns with more than three
nodes, we introduce Rule 4.4. Rule 4.4 states that the order
of two Expand operators can be interchanged if neither of the
operators accesses an attribute that is introduced by the other
operator. Formally, the rule is de ned as follows. Consider
two Expand operators ; 2 f"; #g, a graph relation E, and
the attributes a; c 2 attr(E). Then, the following equivalence
holds.</p>
      <p>cd( ab(E))
ab( cd(E))
(4.4)
Proof. As a precondition for the attributes a; c 2 attr(E),
it has to hold that a 6= d and b 6= c. Since the direction
of cd and ab is not changed by switching their evaluation
order, sch( cd( ba(E))) = sch( ab( cd(E))) holds. Suppose
t 2 val( cd( ab(E))). It follows that there is an edge
between t:c and t:d where c 2 attr(E). As a consequence,
t[c; d; attr(E)] 2 val( cd(E)). From t 2 val( cd( ba(E))) it
follows that there is an edge between t:a and t:b where
a 2 attr(E). Therefore, t 2 ab( cd(E)) holds. The other
direction can be shown analogously.</p>
      <p>Up to now, we only looked at traversals that start at a
single node. However, a pattern can also be traversed by
starting at multiple nodes and joining the resulting
subpatterns. For example, consider the previous pattern, in which
y nodes have ingoing edges from x and z nodes. We could
traverse this pattern by starting at x and z nodes. Then, we
determine the y nodes that are connected by outgoing edges
to the x nodes. For the z nodes, we also determine the y
nodes that are connected by outgoing edges. As a result, we
get a set of subgraphs with x and y nodes connected to each
other and a set of subgraphs with z and y nodes connected to
each other. As a last step, we have to join the subgraphs that
have the same y nodes. Rule 4.5 below gives the equivalence
between these two alternative evaluation strategies.
#zy ("yx (E))
"yx (E) ./ "zy ( z)
(4.5)
Proof. Consider t 2 val(#zy ("yx (E))). Then, there is an edge
e with (e) = (t:x; t:y). Consequently, t[x; y] 2 val("yx (E))
holds. Since t 2 val(#zy ("yx (E))) holds, it also follows
that there is an edge e0 with (e0) = (t:z; t:y). Therefore,
t[y; z] 2 val("zy ( z)) holds. t[x; y] and t[y; z] have the same
value in y. As a consequence, t 2 val("yx (E) ./ "zy ( z))
holds. The other direction can be shown analogously.</p>
      <p>Pushing selections down in the operator tree of a query
evaluation plan is an important and well-known heuristic to
reduce the size of intermediate results. Therefore, we de ne
Rule 4.6, which states that the order of a selection and the
Expand operator can be interchanged if no variable in the
selection condition is introduced by the Expand operator.
More formally, let 2 f"; #g be an Expand operator and
F a selection condition. Without loss of generality, assume
that creates a new column y. If F does not contain
properties/labels of node y or the edge introduced by , then
the following equivalence holds.</p>
      <p>F( xy(E))
y
x( F(E))
(4.6)
The proof for Rule 4.6 is trivial and is therefore omitted.
5.</p>
    </sec>
    <sec id="sec-5">
      <title>OPTIMIZATIONS</title>
      <p>In this section, we demonstrate how Cypher queries can be
optimized at the logical level by transforming them using
the equivalences of Section 4. In order to demonstrate the
potential bene ts of this approach, we will present example
queries for which the optimizer of the Neo4j Community
Edition Version 2.3.1 does not nd the best execution plan.</p>
      <p>First, we show how the Cypher query that we used as a
motivating example in the introduction can be transformed
into a more e cient query. The following algebra expression
is a logical representation of the Cypher query given in the
introduction.
This statement rst selects all actor nodes with the name \Al
Pacino". Afterwards, the neighbors connected by outgoing
edges are determined. For each of the resulting nodes of this
ExpandOut operation the neighbors connected by incoming
edges are determined and only the neighbors with the name
\Robert De Niro" are selected. In order to transform this
logical plan into a physical plan, a cost model would be
required, which we plan to develop in future work. For
evaluation purposes, we therefore try to manually nd a
physical plan that has the same evaluation order as our
logical plan. A physical plan that satis es this requirement
is the rst execution plan given in the introduction.</p>
      <p>Since the current form of the logical algebra expression does
not yield the most e cient execution strategy, we transform
it into an alternative expression by applying the equivalence
rules de ned in Section 4. The original expression begins the
evaluation at a single actor node and then expands this node
twice. As a consequence, we retrieve all actors that starred
in all movies in which the rst actor also starred. Since
we are ultimately only looking for one actor, this execution
strategy does not use the most selective access path and we
end up loading to much data. A better strategy would be
to retrieve both actor nodes separately, expand them to get
the nodes corresponding to movies the actors starred in, and
then perform a join on these movie nodes. This execution
strategy is more e cient because it avoids accessing the set
of all person nodes related to movies of \Al Pacino", which
is much larger than the sets of movies nodes related to \Al
Pacino" and \Robert De Niro".</p>
      <p>Q2</p>
      <p>Q3
As a next step, we push the outer selection into the join.
Note that this transformation can be done using an existing
equivalence rule from the relational algebra.</p>
      <p>"yx ( x:lv=\Actor"^x:name=\Al Pacino"( x)) ./</p>
      <p>z:lv=\Actor"^z:name=\Robert De Niro"("zy ( z))
As a last step, we move the selection into the Expand
operator by applying Rule 4.6.</p>
      <p>4:6 "yx ( x:lv=\Actor"^x:name=\Al Pacino"( x)) ./</p>
      <p>"zy ( z:lv=\Actor"^z:name=\Robert De Niro"( z))
The evaluation order of the resulting expression is the same
as in the execution plan of the transformed Cypher query
given in the introduction. In order to force Neo4j to execute
the query using this plan, we add the USING INDEX hint.
Compared to the original query, the number of database hits
is reduced by about 95% (cf. Figure 2, Q1).</p>
      <p>The next Cypher query, which we use as an example
to demonstrate the transformation-based optimization
supported by our algebra, is given below. The query returns
all subgraphs in which a person gives a ve star rating to a
movie in which \Al Pacino" acted.</p>
      <sec id="sec-5-1">
        <title>MATCH (x:Person)-[e:RATED]-&gt;(y)&lt;- -(z) WHERE e.stars = 5 AND z.name = \Al Pacino"</title>
        <p>RETURN *
In order to execute this query, Neo4j uses the following
execution plan.</p>
        <p>Filter[e.stars=5 ^ x:lv =\Person"]</p>
      </sec>
      <sec id="sec-5-2">
        <title>Expand(All)[y&lt;-[e:RATED]-x]</title>
        <sec id="sec-5-2-1">
          <title>Expand(All)[z- -&gt;y]</title>
          <p>Filter[z.name=\Al Pacino"]</p>
        </sec>
        <sec id="sec-5-2-2">
          <title>AllNodesScan[z]</title>
          <p>DB hits
the previous example, Neo4j can be forced to use a better
execution plan by including the hint \USING SCAN x:Person".
This more e cient execution plan is given below.</p>
          <p>Filter[z.name=\Al Pacino"]</p>
        </sec>
        <sec id="sec-5-2-3">
          <title>Expand(All)[y&lt;- -z]</title>
          <p>Filter[e.stars=5]</p>
        </sec>
      </sec>
      <sec id="sec-5-3">
        <title>Expand(All)[x-[e:RATED]-&gt;y]</title>
        <sec id="sec-5-3-1">
          <title>NodeByLabelScan[x:Person]</title>
          <p>Instead of scanning all nodes of the graph, this execution
plan only accesses nodes with the label \Person". Therefore,
the number of database hits is reduced by about 20% (cf.
Figure 2, Q2). Since the optimizer of Neo4j is unable to nd
this execution plan without hints, we show how our algebra
and its equivalence rules can be used to enumerate plans
with alternative evaluation orders at the logical level.</p>
          <p>We begin with a logical representation of the Cypher query
that has the same evaluation order as the original execution
plan.</p>
          <p>xy:le=\RATED"^xy:stars=5^x:lv=\Person"(#yx (</p>
          <p>"zy ( z:name=\Al Pacino"( z))))
Next, we move the inner selection out of the Expand
operators.</p>
          <p>Then, the order of the edge traversals is changed.
4:6
4:1
4:3
xy:le=\RATED"^xy:stars=5^x:lv=\Person"(
z:name=\Al Pacino"(#yx ("zy ( z))))
xy:le=\RATED"^xy:stars=5^x:lv=\Person"(
z:name=\Al Pacino"(#yx (#zy ( y))))
xy:le=\RATED"^xy:stars=5^x:lv=\Person"(
z:name=\Al Pacino"(#zy ("yx ( x))))</p>
        </sec>
      </sec>
      <sec id="sec-5-4">
        <title>Finally, the outer selection is pushed down.</title>
        <p>"yx ( x:lv=\Person"( x)))))
z:name=\Al Pacino"(#zy ( xy:le=\RATED"^xy:stars=5(
This logical plan has the same evaluation order as the
alternative execution plan that we introduced above.</p>
        <p>The last Cypher query that we use as an example to
illustrate the bene ts of our approach returns all movies, in
which \Al Pacino" acted and which \Michael Mann" directed.</p>
      </sec>
      <sec id="sec-5-5">
        <title>MATCH (x:Actor)- -&gt;(y)&lt;- -(z:Director) WHERE x.name = \Al Pacino" AND z.name = \Michael Mann"</title>
        <p>RETURN y
For this example query, Neo4j uses the execution plan given
below.</p>
        <p>Filter[z.name=\Michael Mann" ^ z:lv =\Director"]</p>
        <sec id="sec-5-5-1">
          <title>Expand(All)[y&lt;- -z] Expand(All)[x- -&gt;y]</title>
          <p>Again, this is not the most e cient execution plan because
it accesses all nodes of the graph in the rst step. As in
The pattern search begins traversing the graph at the \Al
Pacino" actor node. However, it is more e cient to start at
the \Michael Mann" director node. By adding \USING INDEX
z:Director(name)" to the Cypher query, Neo4j produces the
following execution plan that corresponds to this improved
execution strategy.</p>
          <p>Filter[x.name=\Al Pacino" ^ x:lv =\Actor"]</p>
          <p>This alternative plan reduces the number of database hits
by about 60% (cf. Figure 2, Q3). In order to change the
evaluation order of the rst plan into the evaluation order of
the alternative plan, the corresponding logical expressions
can be transformed as follows.</p>
          <p>y( z:name=\Michael Mann"^ z:lv=\Director"(</p>
          <p>#zy ("yx ( x:name=\Al Pacino"^x:lv=\Actor"( x)))))
As a rst step, we move the inner selection out of the Expand
operators.</p>
          <p>4:6
4:1
4:3
4:6
Then, the order of the node traversals can be changed.
y( z:name=\Michael Mann"^ z:lv=\Director"(
x:name=\Al Pacino"^x:lv=\Actor"(#zy ("yx ( x)))))
y( z:name=\Michael Mann"^ z:lv=\Director"(
x:name=\Al Pacino"^x:lv=\Actor"(#zy (#yx ( y)))))
y( z:name=\Michael Mann"^ z:lv=\Director"(
x:name=\Al Pacino"^x:lv=\Actor"(#yx ("zy ( z)))))</p>
        </sec>
      </sec>
      <sec id="sec-5-6">
        <title>Finally, the outer selection is pushed down.</title>
        <p>y( x:name=\Al Pacino"^x:lv=\Actor"(#yx ("zy (
z:name=\Michael Mann"^ z:lv=\Director"( z)))))
The evaluation order of this logical expression is equivalent to
the execution order of the physical plan that Neo4j produces
if the hint \USING INDEX z:Director(name)" is added to the
original Cypher query as described above.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>RELATED WORK</title>
      <p>
        Numerous graph query languages have been proposed in
the literature [
        <xref ref-type="bibr" rid="ref10 ref13 ref15 ref17 ref6 ref8">6, 8, 10, 13, 15, 17</xref>
        ]. However, in contrast to
relational database systems, a standard query language for
graph databases has yet to emerge. The lack of such a
common query language might be a reason why there are
still relatively few works on graph query optimization.
      </p>
      <p>
        Schmidt et al. [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] study the foundations of query
optimization in the context of SPARQL. They introduce algebraic
rewriting rules as well as semantic optimizations on the
SPARQL code level. Yakovets et al. [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] work on a
costbased optimization of SPARQL property paths. In their
approach, an execution plan is composed of nite automata
de ning the search order of patterns in the graph. He and
Singh [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] introduce an algorithm for nding patterns in a
graph. A building block in this algorithm is a procedure that
determines the search order of nodes based on a cardinality
estimation.
      </p>
      <p>
        To the best of our knowledge, there is also no common
algebra that is general enough to represent the graph queries
of all proposed languages. Cyganiak [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] describes how simple
SPARQL graph patterns can be mapped to relational algebra
expressions. Harris and Shadbolt [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] and Chebotko et al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]
also show how a subset of SPARQL can be represented by
relational algebra expressions. However, by using the standard
relational algebra, a graph traversal has to be represented as
a sequence of joins. As joins work on logical addresses, these
approaches are not well suited for native graph databases
such as Neo4j, which use physical addresses (pointers) to
traverse neighboring nodes. The Expand operator proposed
in this paper is independent of the underlying data storage
and processing model. It is therefore a rst step towards a
general graph algebra.
      </p>
      <p>
        Sakr et al. [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] introduce the query language G-SPARQL,
which extends SPARQL for querying property graphs. In
addition, the authors show how a property graph can be stored
in a relational database system by adapting a decomposed
storage model (DSM) [
        <xref ref-type="bibr" rid="ref1 ref7">1, 7</xref>
        ]. The authors also de ne a set of
algebraic operators that are used to represent G-SPARQL
queries. First, they introduce simple operators to retrieve
neighboring nodes or to retrieve property values of nodes
and edges. Second, they de ne more complex operators as,
for example, an operator to compute shortest paths. As a
consequence, the algebra of Sakr et al. is a better
foundation for graph databases than the pure relational algebra.
Nevertheless, their algebra is still limited. For a given node,
only outgoing edges can be traversed, wheras native graph
databases such as Neo4j, also support the retrieval of
neighboring nodes that are connected by incoming edges. Our
Expand operator can represent both directions and is
therefore more general. Additionally, Sakr et al. do not propose
any equivalence rules for their algebra that can be used to
enumerate di erent traversal orders.
      </p>
      <p>
        He and Singh [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] introduce an algebra for property graphs,
which is also based on the relational algebra. They introduce
a composition operator that is used to generate a new graph
from a matched graph. In addition, the authors generalize
the selection operator to graph pattern matching. Therefore,
on the logical level, the pattern matching is represented by a
single selection operator for which the authors propose an
access method. Due to this coarse granularity, the algebra of
He and Singh is not an ideal basis for query plan search space
exploration in a transformation-based query optimizer.
7.
      </p>
    </sec>
    <sec id="sec-7">
      <title>CONCLUSION AND FUTURE WORK</title>
      <p>In this work, we presented how the relational algebra can be
extended with a general Expand operator in order to enable
the transformation-based enumeration of di erent traversal
patterns for nding a pattern in a graph G. The input and
output of our algebra operators are so-called graph relations.
The tuples of a graph relation represent subgraphs of G
matching a given pattern. Additionally, we introduced a set
of equivalence rules and gave proofs for their correctness. In
the context of Neo4j, we demonstrated how these equivalences
can be used to algebraically transform Cypher query at the
logical level. Finally, we illustrated the potential performance
bene ts of this approach by comparing the database hits of
the query evaluation plan found to Neo4j to one resulting
from applying our equivalence rules.</p>
      <p>Our long-term goal is to design and develop a
transformation-based query optimizer for graph databases. Although
the work presented in this paper is a rst step towards this
goal, there remain open challenges that we will address in
future work. First, the algebra presented in this paper is
restricted to simple pattern matching functionalities.
However, in graph query languages such as Cypher, it is possible
to search for variable length patterns or to aggregate nodes.
Therefore, we plan to integrate these features in our algebra
in a next step.</p>
      <p>Second, apart from enumerating alternative query plans in
the search space, a query optimizer also needs to assign a cost
to these plans in order to choose the most e cient one. While
plan enumeration is supported by a general graph algebra
together with its equivalences, a corresponding cost model
has yet to be developed. Note that reduction of database hits
reported in Section 5 are measured empirically by executing
the two versions of the query in Neo4j, rather than estimated
by an analytical cost model.</p>
      <p>
        In order to build this query optimizer, we plan to use the
Cascades framework de ned by Graefe [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. On the one
hand, Cascades is based on a exible and modular
architecture, which facilitates the integration of new operators
and transformation rules. On the other hand, the Cascades
framework is used in commercial systems such as Microsoft
SQL Server. With this implementation as a foundation, we
can both study how graph-speci c cardinality statistics
affect query evaluation and integrate advanced search space
pruning strategies [
        <xref ref-type="bibr" rid="ref16 ref2">2, 16</xref>
        ].
      </p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgement</title>
      <p>This work is supported by Grant No. GR 4497/2 of the
Deutsche Forschungsgemeinschaft (DFG). Additionally, the
authors would like to thank Leonard Worteler for his feedback
on an early version of the graph algebra presented in this
paper.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>D. J.</given-names>
            <surname>Abadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Marcus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. R.</given-names>
            <surname>Madden</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Hollenbach</surname>
          </string-name>
          .
          <article-title>Scalable Semantic Web Data Management using Vertical Partitioning</article-title>
          .
          <source>In Proc. Intl. Conf. on Very Large Data Bases (VLDB)</source>
          , pages
          <fpage>411</fpage>
          {
          <fpage>422</fpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>R.</given-names>
            <surname>Ahmed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Sen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Poess</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Chakkappen</surname>
          </string-name>
          .
          <article-title>Of Snowstorms and Bushy Trees</article-title>
          .
          <source>In Proc. Intl. Conf. on Very Large Data Bases (VLDB)</source>
          , pages
          <fpage>1452</fpage>
          {
          <fpage>1461</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Amer-Yahia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. V. S.</given-names>
            <surname>Lakshmanan</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Yu. SocialScope: Enabling Information</surname>
          </string-name>
          <article-title>Discovery on Social Content Sites</article-title>
          .
          <source>In Proc. Intl. Conf. on Innovative Database Research (CIDR)</source>
          , pages
          <fpage>525</fpage>
          {
          <fpage>528</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Arenas</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Perez</surname>
          </string-name>
          .
          <article-title>Querying Semantic Web Data with SPARQL</article-title>
          .
          <source>In Proc. Intl. Symp. on Principles of Database Systems (PODS)</source>
          , pages
          <fpage>305</fpage>
          {
          <fpage>316</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Chebotko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Fotouhi. Semantics Preserving</surname>
          </string-name>
          SPARQL-to-
          <source>SQL Translation. Data Knowledge Engineering</source>
          ,
          <volume>68</volume>
          :
          <fpage>973</fpage>
          {
          <fpage>1000</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M. P.</given-names>
            <surname>Consens</surname>
          </string-name>
          and
          <string-name>
            <surname>A. O. Mendelzo.</surname>
          </string-name>
          <article-title>GraphLog: A Visual Formalism for Real Life Recursion</article-title>
          .
          <source>In Proc. Intl. Symp. on Principles of Database Systems (PODS)</source>
          , pages
          <fpage>404</fpage>
          {
          <fpage>416</fpage>
          ,
          <year>1990</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>G. P.</given-names>
            <surname>Copeland</surname>
          </string-name>
          and
          <string-name>
            <surname>S. N.</surname>
          </string-name>
          <article-title>Khosha an. A Decomposition Storage Model</article-title>
          .
          <source>In Proc. Intl. Conf. on Management of Data (SIGMOD)</source>
          , pages
          <fpage>268</fpage>
          {
          <fpage>279</fpage>
          ,
          <year>1985</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>I. F.</given-names>
            <surname>Cruz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. O.</given-names>
            <surname>Mendelzon</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P. T.</given-names>
            <surname>Wood</surname>
          </string-name>
          .
          <string-name>
            <given-names>A Graphical</given-names>
            <surname>Query Language Supporting</surname>
          </string-name>
          <article-title>Recursion</article-title>
          .
          <source>In Proc. Intl. Conf. on Management of Data (SIGMOD)</source>
          , pages
          <fpage>323</fpage>
          {
          <fpage>330</fpage>
          ,
          <year>1987</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>R.</given-names>
            <surname>Cyganiak</surname>
          </string-name>
          .
          <article-title>A Relational Algebra for SPARQL</article-title>
          .
          <source>Technical report</source>
          , HP Labs, Bristol, UK,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Dries</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Nijssen</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L. D.</given-names>
            <surname>Raedt</surname>
          </string-name>
          .
          <article-title>A Query Language for Analyzing Networks</article-title>
          .
          <source>In Proc. Intl. Conf. on Information and Knowledge Management (CIKM)</source>
          , pages
          <fpage>484</fpage>
          {
          <fpage>494</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>W.</given-names>
            <surname>Fan</surname>
          </string-name>
          .
          <article-title>Graph Pattern Matching Revised for Social Network Analysis</article-title>
          .
          <source>In Proc. Intl. Conf. on Database Theory (ICDT)</source>
          , pages
          <fpage>8</fpage>
          {
          <fpage>21</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>G.</given-names>
            <surname>Graefe.</surname>
          </string-name>
          <article-title>The Cascades Framework for Query Optimization</article-title>
          .
          <source>Data Engineering Bulletin</source>
          ,
          <volume>18</volume>
          :
          <fpage>19</fpage>
          {
          <fpage>29</fpage>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>R. H.</given-names>
            <surname>Gu</surname>
          </string-name>
          <article-title>ting. GraphDB: Modeling and Querying Graphs in Databases</article-title>
          .
          <source>In Proc. Intl. Conf. on Very Large Data Bases (VLDB)</source>
          , pages
          <fpage>297</fpage>
          {
          <fpage>308</fpage>
          ,
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>S.</given-names>
            <surname>Harris</surname>
          </string-name>
          and
          <string-name>
            <given-names>N.</given-names>
            <surname>Shadbolt. SPARQL Query</surname>
          </string-name>
          <article-title>Processing with Conventional Relational Database Systems</article-title>
          .
          <source>In Proc. Intl. Conf. on Web Information Systems Engineering (WISE)</source>
          , pages
          <fpage>235</fpage>
          {
          <fpage>244</fpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>H.</given-names>
            <surname>He</surname>
          </string-name>
          and
          <string-name>
            <given-names>A. K.</given-names>
            <surname>Singh</surname>
          </string-name>
          .
          <article-title>Graphs-at-a-time: Query Language and Access Methods for Graph Databases</article-title>
          .
          <source>In Proc. Intl. Conf. on Management of Data (SIGMOD)</source>
          , pages
          <fpage>405</fpage>
          {
          <fpage>418</fpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>Y. E.</given-names>
            <surname>Ioannidis</surname>
          </string-name>
          and
          <string-name>
            <given-names>Y. C.</given-names>
            <surname>Kang</surname>
          </string-name>
          .
          <article-title>Left-deep vs. Bushy Trees: An Analysis of Strategy Spaces and its Implications for Query Optimization</article-title>
          .
          <source>In Proc. Intl. Conf. on Management of Data (SIGMOD)</source>
          , pages
          <fpage>168</fpage>
          {
          <fpage>177</fpage>
          ,
          <year>1991</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>U.</given-names>
            <surname>Leser</surname>
          </string-name>
          .
          <article-title>A Query Language for Biological Networks</article-title>
          . Bioinformatics,
          <volume>21</volume>
          :
          <fpage>33</fpage>
          {
          <fpage>39</fpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>M. S. Mart n</surname>
            , C. Gutierrez, and
            <given-names>P. T.</given-names>
          </string-name>
          <string-name>
            <surname>Wood</surname>
          </string-name>
          .
          <article-title>SNQL: A Social Network Query and Transformation Language</article-title>
          .
          <source>In Proc. Intl. Workshop on Foundations of Data Management (AMW)</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>S.</given-names>
            <surname>Sakr</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Elnikety</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Y.</given-names>
            <surname>He. G-SPARQL:</surname>
          </string-name>
          <article-title>A Hybrid Engine for Querying Large Attributed Graphs</article-title>
          .
          <source>In Proc. Intl. Conf. on Information and Knowledge Management (CIKM)</source>
          , pages
          <fpage>335</fpage>
          {
          <fpage>344</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>M.</given-names>
            <surname>Schmidt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Meier</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G.</given-names>
            <surname>Lausen</surname>
          </string-name>
          .
          <article-title>Foundations of SPARQL Query Optimization</article-title>
          .
          <source>In Proc. Intl. Conf. on Database Theory (ICDT)</source>
          , pages
          <fpage>4</fpage>
          {
          <fpage>33</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>N.</given-names>
            <surname>Yakovets</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Godfrey</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Gryz</surname>
          </string-name>
          . Waveguide:
          <article-title>Evaluating SPARQL Property Path Queries</article-title>
          .
          <source>In Proc. Intl. Conf. on Extending Database Technology (EDBT)</source>
          , pages
          <fpage>525</fpage>
          {
          <fpage>528</fpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>