<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Join Operator for Property Graphs</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Giacomo Bergami</string-name>
          <email>giacomo.bergami2@unibo.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Matteo Magnani</string-name>
          <email>matteo.magnani@it.uu.se</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Danilo Montesi</string-name>
          <email>danilo.montesi@unibo.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Bologna, CSE Department</institution>
          ,
          <addr-line>Bologna</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Uppsala University, Department of Information, Technology</institution>
          ,
          <addr-line>Uppsala</addr-line>
          ,
          <country country="SE">Sweden</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In the graph database literature the term \join" does not refer to an operator combining two graphs, but involves path traversal queries over a single graph. Current languages express binary joins through the combination of path traversal queries with graph creation operations. Such solution proves to be not e cient. In this paper we introduce a binary graph join operator and a corresponding algorithm outperforming the solution proposed by query languages for either graphs (Cypher, SPARQL) and relational databases (SQL). This is achieved by using a speci c graph data structure in secondary memory showing better performance than state of the art graph libraries (Boost Graph Library, SNAP) and database systems (Sparksee).</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>
        Despite the term \join" appearing in the graph database
literature, such operator cannot be used to combine two
distinct graphs, as for table joins in the relational model.
Such joins are path joins running over a single graph [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]:
they are used for graph traversal queries [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] where vertices
and edges are considered as relational tables [
        <xref ref-type="bibr" rid="ref16 ref25">25, 16</xref>
        ]. The
result of such path joins cannot be directly used to
combine values from di erent sources (e.g. join two distinct
vertices appearing in di erent graphs alongside with their
values), and hence supplementary graph operations are
required. SPARQL allows to access multiple graph resources
through named graphs and performs graph traversals one
graph at a time through path joins [
        <xref ref-type="bibr" rid="ref12 ref27 ref3">12, 3, 27</xref>
        ]. At this point
the CONSTRUCT clause is required if we want to nally
combine the traversed paths from both graphs into a resulting
graph. Similarly, Cypher's CREATE clause has to be used
to generate new vertices and edges from graph patterns
extracted through the MATCH...WHERE clause, and
intemediate results are merged with UNION ALL. While current graph
query languages allow to express our proposed graph join
operator as a combination of the aforementioned operators,
our study shows that our specialized graph join algorithm
outperforms the evaluation of the graph join with existing
graph and relational query languages.
      </p>
      <p>
        As for relational databases, they solve common graph
queries e ciently, so graph database management systems
rely either on relational database engines [
        <xref ref-type="bibr" rid="ref1 ref11 ref21">1, 21, 11</xref>
        ] or on
column store databases [
        <xref ref-type="bibr" rid="ref25 ref7">25, 7</xref>
        ]. Moreover, relational databases
already have e cient implementations for (equi) join
algorithms [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]. We want to show that graph joins over the
relational data model are not ine cient. Before all, let us
see an example of a graph join query:
      </p>
      <p>Example 1. Consider an on-line service such as
ResearchGate (Figure 1a, or Academia.edu) where researchers can
follow each others' work, and a citation graph (Figure 1b).
Now we want to \return the paper graph where a paper cites
another one i . the rst author 1Author of the rst paper
follows the 1Author of the second. (Figure 1c)". The
ResearchGate graph does not contain any edge regarding the
references, while the Reference graph does not contain any
information pertaining to the follow relations. This demands
a join between the two graphs: as a rst step we join the
vertices together as in the relational model (vertices are
considered as tuples using Name = 1Auth as a vertex equi-join
predicate, ) and then combine the edges from both graphs.
Accordingly to the query formulation, we establish an edge
between two joined vertices only if the source has a paper
citing the destination, and the user in the source follows the
user in the destination.</p>
      <p>
        Let us now examine the graph join implementation within
the relational model: vertices and edges are represented as
two relational tables ([
        <xref ref-type="bibr" rid="ref25">25</xref>
        ], Figure 2a). In addition to the
attributes within the vertices' and the edges' tables, we
assume that each row (on both vertices and edges) has an
attribute id enumerating vertices and edges. Concerning
SQL interpretation of such graph join, we rst join the
vertices (see the records linked by lines in Figure 2a). Then
the edges are computed through the join query provided in
Figure 2b: the root and the leaves are the result of the join
between the vertices, while the edges appear as the
intermediate nodes. An adjacency list representation of a graph,
as the one proposed in the current paper, reduces the joins
within the relation solution to one (each vertex and edge is
traversed only once), thus reducing the number of required
operation to create the resulting graph. Other ine ciency
{
F
o
ll
o
w
s
}
      </p>
      <p>{User}
Name=Alice</p>
      <p>{User}
Name=Carl
{Follows}
{Follows}</p>
      <p>{User}
Name=Bob</p>
      <p>{User}
Name=Dan
{
F
o
ll
o
w
s
}
{User,Paper} {User,Paper}
Title=Graphs Title=Join
1Author=Alice 1Author=Alice</p>
      <p>Name=Alice Name=Alice
Tit{Ules=erP,Praopjeecrt}ion {Follows,Cites}
1Author=Carl
Name=Carl
{User,Paper}
Title=OWL
1Author=Bob
Name=Bob
{User,Paper}
Title=μ-calc
1Author=Dan
Name=Dan</p>
      <p>VResearchGate
id Name `v
6 Alice {User}
7 Bob {User}
8 Carl {User}
9 Dan {User}</p>
      <p>EResearchGate
id src dst `e
5 6 7 {Follows}
6 6 8 {Follows}
7 7 9 {Follows}
8 9 8 {Follows}
θ
θ
θ
θ</p>
      <p>VReference
id Title Name `v
1 Graphs Alice {Paper}
2 Join Alice {Paper}
3 OWL Bob {Paper}
4 Project Carl {Paper}
5 μ-calc Dan {Paper}
{Paper}</p>
      <p>Title=Join
1Author=Alice
}
s
e
t
Ci {Paper}
{
Title=Projection
1Author=Carl</p>
      <p>Cie}
ts
{
{Cites}
{Follows}</p>
      <sec id="sec-1-1">
        <title>VResearchGate ./ VRef erence</title>
        <p>on</p>
      </sec>
      <sec id="sec-1-2">
        <title>EResearchGate</title>
        <p>on</p>
      </sec>
      <sec id="sec-1-3">
        <title>VResearchGate ./ VRef erence</title>
      </sec>
      <sec id="sec-1-4">
        <title>VResearchGate ./ VRef erence</title>
        <p>{Paper}
Title=OWL
1Author=Bob
{Paper}
Title=μ-calc
1Author=Dan
{User,Paper}
Title=OWL
1Author=Bob
Name=Bob
{
F
o
ll
o
w
{User,Paper} }
s
Title=μ-calc
1Author=Dan
Name=Dan
on</p>
      </sec>
      <sec id="sec-1-5">
        <title>ERef erence</title>
        <p>on
(b) SQL join query plan required to create edges for
(a) Representing the operands' vertices and edges with tables. The ResearchGate1N^ame=1AuthorReference. The leaves acts as the
join for the vertices only involves tables VResearchGate and Vprojects. edges' sources while the root as their destinations.
considerations for graph query languages are provided in the
Related Work section (Section 6.1).</p>
        <p>Example 1 showed only one possible way to combine the
operands' edges, but we can even return edges pertaining
to both operands as in the following query: \For each paper
reveal both the direct and the indirect dependencies (either
there is a direct paper citation, or one of the authors follows
the other one in ResearchGate)". The resulting graph
(Figure 1d) has the same vertex set than the previous one, but
they di er on the nal edges. This implies that our graph
join de nition must be general enough to allow di erent edge
combinations: we refer to those as edge semantics, \es" for
shorthand. This paper provides two contributions:</p>
        <p>Graph join operator 1es (Section 3), combining
both vertices ( ) and edges (es). A property graph
model (Section 2) is used as a data model of choice.</p>
        <p>Graph Conjunctive Equijoin Algorithm (Section
4): vertex buckets ordered by hash value are created
and the resulting graphs' edges and vertices are
produced at the same time. Our solution outperforms the
query evaluations in SPARQL, Cypher and SQL
(Section 5.2). Since the aforementioned algorithm relies on
an ad hoc secondary memory data structure, we tested
it over di erent graph libraries (Boost, SNAP) and low
level graph databases (Sparksee). Even in this case our
solution provide better results with large graphs
(Section 5.1).</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. PRELIMINARIES</title>
      <p>
        We model the vertices' and edges' set as multisets (of
tuples) S of elements si, where si unequivocally identi es
the i-th occurrence of a tuple s in S. Each tuple associates
to each attribute a value: it is a function A 7! V [ fNULLg
mapping each attribute in A to either a value in V or NULL ("
is the empty tuple). We slightly change the property graph
de nition in [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] in order to ease the join de nition between
vertices and edges as later on required by the graph join:
      </p>
      <p>Definition 1 (Property Graph). A property graph
is a tuple G = (V; E; v; e; Av; Ae; ; `v; `e) where (a) V is
a multiset of nodes, (b) E is a multiset of edges, (c) v is
a set of node labels, (d) e is a set of edge labels, (e) Av is
a set of node attributes, (f ) Ae is a set of edge attributes,
(g) : E ! V V is a function assigning node pairs to edges,
(h) `v : V ! P( v) is a function assigning a set of labels to
nodes, and (i) `e : E ! P( e) is a function assigning a set
of labels to edges.</p>
      <p>This is the baseline for our graph database:</p>
      <p>Definition 2 (Graph Database). A graph database
is a collection of n distinct property graphs fG1; : : : ; Gng
represented as a single property graph D with n distinct
connected components. From now on we refer to each
component simply as graph. Each graph is identi ed by two
functions: V : f1; : : : ; ng 7! P(V ) determining the vertices V(i)
of the i-th graph and E : f1; : : : ; ng 7! P(E) determining the
edges E(i) of the i-th graph.</p>
      <p>Example 2. Two edges ei and fj come from two
distinct graphs, respectively Ga and Gb, within the same graph
database D. Edge ei connects vertex uh to vk ( (ei) =
(uh; vk)), while fj connects u0h to vk0 ( (fj ) = (u0h; vk0)).
Such edges store only the following values:</p>
      <p>ei(Time) = 12:04; fj (Day) = M on
and have the following labels:</p>
      <p>`e(ei) = f Follow g ; `e(fj ) = f FriendOf g</p>
      <p>For the multiset -join, we need a function combining
two tuples for the relational join operator over multisets,
where ri tj is a valid multiset element (r s)i j and i j
maps each integer pair (i; j) to a single number.</p>
      <p>
        Definition 3 ( -Join). Given two (multiset) tables R
and S over a set of attributes A1 and A2, the -join R ./ S
[
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ] is de ned as follows:
fri
sj j ri 2 R; sj 2 S; (ri; sj );(ri
(ri
sj )(A1) = ri;
sj )(A2) = sj g
where (t t0)(Ai) denotes the projection of the tuple t t0
over Ai. If is the always true predicate, can be omitted
and, when also A1 \ A2 = ;, we have a cartesian product.
      </p>
      <p>If we de ne as a linear function (that is for each
function H, H(ei fj ) = H(ei) H(fj )), the -join also induces
the de nition of `v, `e and for the joined tuples. As a
consequence, must be overloaded for each possible expected
output from H (see De nition 7 in Appendix).</p>
      <p>Example 3. By continuing the previous example,
suppose that the edge ei fj comes from a graph join where
edges from Ga are joined to the ones in Gb in a resulting
graph, where also vertices uh u0h and vk vk0 appear. So:
(ei
fj )(Time) = 12:04; (ei
fj )(Day) = M on
By 's linearity, we have that the labels are merged:
`e(ei
fj ) = `e(ei)
`e(fj ) = fFollowg
fFriendOfg
= fFollow; FriendOfg
And the result's vertices are updated accordingly:
(ei
fj ) = (ei)
(fj ) = (uh; vk)</p>
      <p>(u0u; vk0)
= (uh
u0h; vk
vk0)</p>
      <p>Since all the relevant informations are stored in the graph
database, we represent the graph as the set of the minimum
information required for the join operation.</p>
      <p>Definition 4 (Graph). The i-th graph of a graph
database D is a tuple Gi = (V(i); E(i); Av; Aie), where V(i) is
i
a multiset of vertices and E(i) is a multiset of edges.
Furthermore, Aiv is a set of attributes a 2 Aiv s.t. there is at
least one vertex vj 2 V(i) having vi(a) 6= NULL; Aie is a set
of attributes a0 2 Aie s.t. there is at least one edge ek 2 E(i)
having ek(a0) 6= NULL.
3.</p>
    </sec>
    <sec id="sec-3">
      <title>GRAPH JOINS</title>
      <p>
        As we discussed in the introduction, our graph join is
based on the combination of vertices and edges: Ga ./es Gb
expresses the join of graph Ga with Gb where (i) we rst use
a relational -join among the vertices, and then (ii) we
combine the edges using an appropriate user-determined edge
semantics, es. This modularity is similar to the graph
products de ned in graph theory literature [
        <xref ref-type="bibr" rid="ref14 ref17">14, 17</xref>
        ], where instead
of a join between vertices they have a cross product, and
different semantics are expressed as di erent graph products.
We now provide the graph join de nition:
      </p>
      <p>Definition 5 (Graph -Join). Given two graphs Ga =
(V; E; Av; Ae) and Gb = (V 0; E0; A0v; A0e), a graph -join is
de ned as follows:</p>
      <p>Ga ./es Gb = (V ./ V 0; Ees; Av [ A0v; Ae [ A0e)
where is a binary predicate over the vertices and ./ the
-join (De nition 3) among the vertices, and Ees is a
subset of all the possible edges linking the vertices in V ./ V 0
expressed with the es semantics.</p>
      <p>Given that graph join returns a property graph like the
graphs in input, property graphs are closed under the graph
join operator via the de nition of for the multiset -join.
3.1</p>
    </sec>
    <sec id="sec-4">
      <title>Two possible “es” edge semantics</title>
      <p>
        The result of the join between two graphs, ResearchGate
(Figure 1a) and References (Figure 1b), produces the same
set of vertices regardless of the edge semantics of choice. On
the other hand, edges among the resulting vertices change
according to the edge semantics. In the rst one (Figure
1c) we combine edges appearing in both graphs and linking
vertices that appear combined in the resulting graph. We
have a Conjunctive Join, that in graph theory is known as
Kronecker graph product [
        <xref ref-type="bibr" rid="ref14 ref26">26, 14</xref>
        ]. In this case Ees is de ned
with the \^" es semantics as an edge join E^ = E ./ ^ E0,
where the ^ predicate is the following one:
^(eh; e0k) = (eh 2 E ^ e0k 2 E0) ^ (eh
e0k) 2 (V ./ V 0)2
(1)
      </p>
      <p>We can also de ne a disjunctive semantics (Figure 1d),
having \_" as es. In this case we want edges appearing either
in the rst or in the second operand. This means that two
vertices, uh u0h and vk vk0, could have a resulting edge
ei "0j even if only (ei) = (uh; vk) appears in the rst
operand and "0 is a \fresh" empty edge ("0j ) = (u0h; vk0) not
appearing in Gb such that (ea "0b) = (uh u0h; vk vk0).
Consequently the disjunctive join can be represented as a full
outer join, where the edges either match in the conjunctive
semantics, or appear in the two distinct graph operands:
E_ = E ./
^ E0
(2)</p>
    </sec>
    <sec id="sec-5">
      <title>4. ALGORITHM AND DATA STRUCTURE</title>
      <p>
        We now outline our algorithm, GCEA, for equijoin
predicates, involving an equivalence between attributes or a
conjunction of such equivalences. This speci c predicate choice
was driven by the fact that the most performant and
implemented relational database join is the equi-join [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ].
Moreover we provide an implementation for conjunctive
semantics, since this task is more prone to be optimized than
the disjunctive one. Algorithm 1 for GCEA consists in
three parts: (i) vertex partitioning (bucketing) through an
hashing function (OperandPartitioning) (ii) graph
serialization on secondary memory (SerializeOperand), and
(iii) actual join algorithm over the graphs' buckets
(PartitionHashJoin). Relational partition hash-join undergo
the same phases, even if relational algorithms do not deal
with outgoing edges (lines 31-35 and Section 6). We allow
vertices with replicated values as in current graph databases
implementations (such as Titan and Neo4J). Consequently
ids enumerate the vertices within a single graph.
      </p>
      <p>As a rst step, the hashing function h is inferred from
(line 2): if (u; v) is a binary predicate between distinct
attributes from u and v, then h is de ned as a linear
combination of hash functions over the attributes of either u or
v. When no h could be inferred from , then h is a constant
function.</p>
      <p>Algorithm 1 Graph Conjunctive EquiJoin Algorithm
(GCEA)
1: procedure ConjunctiveJoin(G; G0; )
2: hashFunction = generateHash( );
3: omap1 = OperandPartitioning(G;hashFunction)
4: omap2 = OperandPartitioning(G0;hashFunction)
5: G1 = SerializeOperand(G;omap1)
6: G2 = SerializeOperand(G0;omap2)</p>
      <p>return PartitionHashJoin(G1; G2; )
7:
8: procedure SerializeOperand(G;omap):
9: File VertexIndex = Open();
10: VertexVals= Open(), HashO set= Open();
11: ulong o set = HashO set = 0;
12: for each h 2Keys(omap) do . Ordered maps have ordered keys.
13: HashO set.Write(fh,HashO setg);
14: for each id 2omap[h] do
15: v = G.V [id];
16: v.hash = h; v.o set = VertexVals;
17: VertexIndex.Write(fv.id, h, o setg);
18: ulong o setNext = VA.Write(serialize(v));
19: o set+=o setNext; HashO set+=o setNext;</p>
      <p>return (VertexIndex,VertexVals,HashO set,G.Av,G.Ae)
20:
21: procedure PartitionHashJoin(G1; G2; ):
22: 0(u; u0) := (u; v) ^ (u u0)(Av) = u ^ (u u0)(A0v) = u0;
23: 0(e; e0) := (e e0)(Ae) = e ^ (e e0)(A0e) = e0
24: HI = IntersectHashes(HashO set1,HashO set2).iterator();
25: File AdjF ile = Open();
26: while HI.hasNext() do
27: h = HI.next();
28: for each u 2 VertexVals1[h:o set1], u0 2 VertexVals2[h:o set2] do
29: if 0(u; u0) then
30: AdjFile.Write(V=fu u0g,)
31: HIout = IntersectHashes(outV (u),outV 0(u0)).iterator();
32: while HIout.hasNext() do
33: hout = HIout.next();
34: for each edge e 2 outV (u)[hout:o set1], e0 2</p>
      <p>outV 0(u0)[hout:o set2] do
35: if 0(e:outvertex; e0:outvertex) and 0(e; e0) then
36: AdjFile.Write(E=fe e0g)</p>
      <p>OperandPartitioning performs a vertex bucketing in
main memory: its outcome is an ordered map, where each
vertex v is stored in a collection omap[h(v)], where h is the
aforementioned hashing function. For each operand Gi, the
omapi construction takes at most PjV(i)j log(j) time, where</p>
      <p>j=0
jV(i)j is the multiset vertex size. Such time complexity is
bounded by jV(i)j PjjV=(0i)j log(j) &lt; jV(i)j2 where jV(i)j
1.</p>
      <p>SerializeOperand stores the operand in secondary
memory: both buckets (line 12) and vertices (line 14) are already
sorted by hash value, and hence such data structures are
accessed linearly. Figure 3c depicts a serialized representation
of the graph in Figure 3b: all the labels and the edge values
are not serialized but are still accessible through the
original graph G via id. Buckets are represented by HashO set
providing both the bucket value and the pointer to the rst
vertex of the bucket stored in VertexVals. VertexVals stores
vertices alongside with their adjacency list, where vertices
are sorted by hash value and are represented by id and hash
value. VertexIndex allows to nd the vertices stored in
VertexVals in constant time: each record is ordered by vertex
id, has a constant size and contains the pointer to where the
vertex data is stored in VertexVals. Even the outgoing edges
are stored by the destination vertex's hash value. Given ki
the size of Keys(omapi), this phase takes 3ki + jGij time,
where 2ki is the omap visit cost, ki is the omap serialization
as HashO set and jGij is the time to serialize the graph as
V Ai.</p>
      <p>The last step performs the actual conjunctive join over
the serialized graph (PartitionHashJoin): the data
structure is accessed from secondary memory through memory
mapping. Line 24 prepares the intersection: while
performing a linear scan over the buckets, the HI iterator checks
if both operands have a bucket with the same hash value
(line 26), then the common hash value is extracted (line
27) and the two buckets accessed (line 28), then the
composition u u0 between the vertices is performed (line 30).</p>
      <p>Next, di erently from the relational join, the adjacent
vertices for both operands are visited. Similarly to line 24,
the hash-sorted edges induce a bucketing (line 31), and then
we check if the destination vertices meet the join conditions
alongside with the to-be-joined edges (line 35). Please note
that, as stated out in De nition 5, edges are not ltered by</p>
      <p>predicate. Furthermore, the resulting graph is stored in a
bulk graph where only the vertices id from the two graph
operators appear as pairs. This last operation takes time
k1 + k2 + Ph2HI b1h b2h + out1h out2h where bih is the size
of the h bucket for the i-th operand, while outih is the
outgoing vertices' size for all the vertices within the h bucket
for the i-th operand.</p>
      <p>Such algorithm could be also extended to the disjunctive
semantics as follows: all the edges discarded from the
intersection in line 30 for u u0 should be considered, either if
they come from the left operand or from the right one.
Between all such edges, we consider only those e0 that have a
destination vertex which hash value appears in HI.
Moreover it has to satisfy the binary predicate jointly with
another vertex 0, coming from the opposite operand. Hence
we establish (e.g.) an edge (u u0; 0) having the same
values and attributes of e0 and the same set of labels.
v2
eo
offset
HashOffset
value offset
(a) Data structures used to implement the graph in secondary memory. Each data structure represents a di erent le.</p>
      <p>VertexIndex v00 h1 v11 h2 v22 h1 HashOffset hh11 hh22</p>
    </sec>
    <sec id="sec-6">
      <title>EXPERIMENTAL EVALUATION</title>
      <p>Through the following experiments we want to prove that
(i) both hash buckets and memory mapping for the graph
join operands provide better results for GCEA, (ii) which
outperforms the query plans for other query languages (both
graph and relational). For the rst case we have to use graph
libraries or graph databases where transactions and logging
can be disabled, while for the second we choose state of the
art graph databases implementing speci c query languages.</p>
      <p>
        In order to do so we choose the simplest graph
representation that provides better performances for all the
addressed languages: we choose a graph where only vertices
contain values and where labels are stored in both vertices
and edges. We created our data using the LiveJournal Graph
[
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] containing 4,847,571 unlabelled vertices and 68,993,773
unlabelled edges. Each vertex represents a user which is
connected to each of its friends by an edge. Since no data
values are given within the datasets, we enriched the graph
using the guidelines of the LDBC Social Network
Benchmark protocol [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], and hence associated to each user an IP
address, an Organization and the year of employment1. For
each experiment, the input data were obtained by starting
a random walk from the same vertex but using a di erent
seed for the graph traversal. New data sets were obtained
incrementally by visiting each time a number of vertices that
is a power of 10, from 10 to 106.
      </p>
      <p>We performed our tests over a MacOsX with a 2.2 GHz
Intel Core i7 processor and 16 GB of RAM at 1600 MHz,
and an SSD Secondary Storage with an HFS le system.
We evaluate the graph join using as operands two distinct
sampled subgraphs with the same vertex size (jV j), where
the predicate is the following one: (u; v) d=ef u:Y ear1 =
v:Y ear2 ^ u:Organization1 = v:Organization2. Such
predicate does not perform a perfect 1-to-1 match with the graph
vertices, thus allowing to test the algorithm with di erent
multiplicities values. We tested the algorithm with the
con1More informations regarding our proposed solutions are
available at http://smartdata.cs.unibo.it/?page id=798.
junctive semantics, having a subset of the operations of the
disjunctive one.
5.1</p>
    </sec>
    <sec id="sec-7">
      <title>Evaluating Data Structures</title>
      <p>
        We benchmark our solution with graph data models where
database transactions either do not exist or can be disabled.
We rst consider two graph libraries accessing graphs in
main memory; we tested the Boost Graph Library 1.60.0
with the most e cient con guration for graph traversals
tasks, vec [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ], and Snap 3.0 [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] considering the attributes
only over the vertices (TNodeNet&lt;TAttr&gt;). Then we
consider the Sparksee graph database [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]: transactions were
disabled in the con guration le, as well as logging, rollback
and recovery facilities. Concerning the graph database
management implementation, no assumptions can be made as
it is closed source.
      </p>
      <p>We implemented our graph join algorithm for all the
aforementioned libraries. We used the standard graph library
methods to store the graph in secondary memory
(serialization or graph database storage) and extended the
PartitionHashJoin by doing a preliminary vertex bucketing
phase: buckets are not supported and vertices cannot be
sorted by hash value.</p>
      <p>Join Evaluation Time. In this case we evaluate two
aspects: (i) the join algorithm running time and (ii) the time
required to create the solution and store it in secondary
memory.</p>
      <p>Table 4a provides the cost of performing the sole join
algorithm excluding the result storing time. All the
competitors' graphs were joined through GCEA and vertices with
the same hash were put in the same bucket in main memory.
It must be emphasised that both Boost and SNAP operands
were loaded in primary memory, while our operands were
accessed in secondary memory through memory mapping. The
table shows how all the other data structures had a worse
performance due to the initial cost of the bucket creation
and sorting. We must also remark that this result justi es
the need of our data structure for the proposed algorithm.
The same table provides the time required to store the
results as an adjacent list in secondary memory using the
default graph library representation (non-labelled vertices and
edges, default serialization). In this case our solution always
outperforms the other graph libraries and databases.
Operand creation time. We consider the graph creation
time in main memory and the cost of storing it in secondary
memory per operand (Table 4b). For both Boost and SNAP
the default serialization methods are performed, while for
Sparksee we simply closed the database. In this case our
solution outperforms all the competitors.
5.2</p>
    </sec>
    <sec id="sec-8">
      <title>Join Execution Time</title>
      <p>This last experiment compares the interpretation of query
plans for both relational and graph databases with GCEA.
The opponents' query plans are discussed in Section 6.1.</p>
      <p>We used default con gurations for both Neo4J and
PostgreSQL, while we changed the cache bu ers con gurations
for Virtuoso (as suggested in the con guration le) for 16GB
of RAM. We kept the default multithreaded query
execution plan. Cypher queries were sent using the Java API
but the graph join operation was performed only in Cypher
through the execute method of an GraphDatabaseService
object. PostgreSQL queries were evaluated directly through
the psql client and benchmarked using both explain
analyze and \timing commands. Virtuoso was benchmarked
through iODBC connection evoked in C using Redland RDF
library: no HTTP connections were used and only the
librdf_model_query_execute function was involved in the graph
join operation. Neo4J graphs were ne tuned by indexing
the attributes Organization and Year involved in the query
and, since Cypher language does not allow to access to
different graphs, both graph join operands were stored within
the same graph. Virtuoso triples were not indexed, as a
default set of indices are de ned during the graph creation,
and data is automatically indexed. All the aforementioned
conditions do not degrade the query evaluations.</p>
      <p>Table 5 represents the result of such benchmarks. The
competitors' join time is made up only by the query
evaluation time, while our proposed implementation considered
the whole GCEA algorithm, and hence both the
partitioning phase, the operands' serialization and the actual join
execution were considered. As a result our solutions always
outperform the competitors' query plans within their own
language implementation.
6.</p>
    </sec>
    <sec id="sec-9">
      <title>RELATED WORK</title>
      <p>
        At the time of writing, the only eld where a binary graph
operation is discussed is Discrete Mathematics. Such
operations are de ned over either nite graphs or nite graphs
with cycles, and are named graph products [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. Every graph
product produces a graph whose vertex set is de ned as a
cartesian product between the vertices' sets producing pair
of vertices, while the edge set changes accordingly to the
different graph product de nition. Consequently the Kroneker
Graph Product [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ] is de ned as follows:
G
      </p>
      <p>G0 = (V</p>
      <p>V 0; ((g; h); (g0; h0)) 2 V</p>
      <p>
        V 0 (g; g0) 2 E; (h; h0) 2 E0 )
while the cartesian graph product [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] is de ned as follows:
G G0 = (V V 0; ((g; h); (g0; h0)) 2 V V 0 (g = g0; (h; h0) 2 E0) _ (h = h0; (g; g0) 2 E) )
Other graph products are lexicographic product and strong
product [
        <xref ref-type="bibr" rid="ref14 ref17">14, 17</xref>
        ]. This de nition has two issues: (i) the
resulting vertex set is not made of single vertices but of pair
of vertices, and hence (ii) those graph product de nitions are
not tailored for graphs with embedded data (i.e. property
graphs, triple stores).
      </p>
    </sec>
    <sec id="sec-10">
      <title>6.1 (Graph) Query Languages</title>
      <p>
        In this section we describe how a graph join is
implemented for property graph databases and RDF triplestores.
The reason is twofold: we want both to show that graph joins
can be represented in di erent data representations, and to
detail how our experiments in Section 5 were performed.
Property Graph and Cypher. Property graphs are, to the
best of our knowledge, the more general graph data model
representation because they consider both vertices and edges
as multi-labelled tuples. It is necessary to compare the
performances of our algorithm with query languages running on
top of property graph databases because our physical model
generalizes property graphs. Among the Property Graph
databases we do not consider SQLGraph [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ] because there
is no existing implementation and, most importantly, the
Gremlin query language allows only to perform graph
traversal queries returning bag of values. We choose to perform
our tests over Neo4J using Cypher as a query language,
because Neo4J allows to extend the built-in query plans with
ad hoc solutions [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], eventually allowing an implementation
of our algorithm in a future.
      </p>
      <p>Cypher uses a pipe query evaluation model allowing to
rene queries in further steps. Regarding the implementation
of the graph conjunctive join operator in Cypher,
ValueHashJoins are performed between vertices coming from
different graph operands, and hash values are either evaluated
at run time, or depend on attributes' values indexings. This
choice supports the experimental evidence of Cypher having
a better scalability than SPARQL, where RDF graphs
cannot be indexed by values (see the next paragraph). Once
the Cypher query is transformed into a pipe-based query
plan, most of the pipes' sources appear to be
NodeByLabelScan and AllNodeScan: this means that all the graph's
vertices (with a given label) are considered in the rst steps
of computation. As a result the query plan scans more data
than it should to provide the nal result. In our algorithm
this drawback does not occur because we directly access the
data per buckets on both graph operands, avoiding to
consider any vertices' combinations that will not appear in the
nal result.</p>
      <p>
        RDF triplestores and SPARQL. Triplestore systems store
the graph informations as triplets, (source; property;
destination), where source and destination are two vertices, and
property is the edge linking them. Such property could even
appear as a source vertex whenever additional information is
provided [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] shows that property graphs can be entirely
mapped into RDF triplestore systems as follows:
      </p>
      <p>Definition 6 (Property Graph over Triplestore).
Given a property graph G = (V; E; Av; Ae), each vertex vi 2
V induces a set of triples (vi; ; ) for each 2 Av such that
vi( ) = having 6= NULL. Each edge ej 2 E induces a set
of triples (s; ej; d) such that (ej) = (s; d) and another set
of triples (ej; 0; 0) for each 0 2 Ae such that ej( 0) = 0
having 0 6= NULL. Each property graph G is stored as a
distinct named graph.</p>
      <p>
        This allows us to query each property graph with SPARQL
query language, speci cally targeted for triplestores, through
their RDF representation. We took this query language
into account for our benchmarks because it is well
consolidated: a lot of research has been carried out [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] and e cient
query plans have been implemented [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], even when multiple
graphs are took in input. These results involve the
interpretation and the execution of \optional joins" paths [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], thus
allowing to check whether the graph conjunctive join
conditions are not met for the outgoing edges. Such performances
quickly degrade due to both the sparsity of the data
representation requiring to perform more path joins than the ones
required for the property graph model, and to the CONSTRUCT
clauses are not included in the SPARQL algebra
optimizations. However, CONSTRUCT is required for produce a graph
as a nal outcome of our graph join query. Moreover RDF
triplestores as Virtuoso prefer to index triplets per patterns
and do not allow triplet indexing by values. Within our
tests, we also took into account that both input and output
met the requirements of De nition 6.
7.
      </p>
    </sec>
    <sec id="sec-11">
      <title>CONCLUSIONS</title>
      <p>This paper de nes for the rst time a graph join operator.
A graph algorithm is proposed for the conjunctive semantics,
outperforming the implementations on di erent languages.
By comparing the execution of our algorithm with our graph
data structure with other ones provided by graph libraries,
we also show that our choice of sorting the vertices per hash
value (required for the partition hash join) allows to have
better performances during the overall execution of the join
algorithm.</p>
      <p>
        Some results have been omitted for lack of space. First,
we could prove that our operator is both commutative and
associative. This result could be proved both for the vertices'
and edges' tuples, and for their labels. Such are relevant
properties when such operators are used for data integration
tasks. The bucketing approach allows even to implement the
graph join through a map-reduce approach. Last, we could
extend our algorithm to support predicates in over one
attribute having partially ordered values [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]: since our data
structure is already ordered by hash value, we could just use
a monotone hashing function h w.r.t such attribute.
8.
      </p>
    </sec>
    <sec id="sec-12">
      <title>APPENDIX</title>
      <p>Definition 7 (Concatenation). : A A 7! A is a
lazy evaluated concatenation function between two operands
of type A returning an element of the same type, A. The
concatenation function is a linear function such that, given any
function H with dom(H) = A, H(u v) = H(u) H(v).
is de ned for the following A-s:
sets: it performs the union of the two sets:
S
S0 d=ef S [ S0
to each pair of integers an unique integer: i
Pik+=j0 k + i
integers: it returns the dovetail number associating
j d=ef
functions: given a function f : A 7! B and g : C 7!
D, f g is the function returning f (x) if x 2 dom(A),
and g(x) if x 2 dom(C). NULL is returned otherwise.
Such function concatenation are used when 8x 2 A \
C:f (x) = g(x).
pairs: given two pairs (u; v) and (u0; v0), then the pair
concatenation is de ned as the pairwise concatenation
of each element, that is (u; v) (u0; v0) d=ef (u u0; v v0).
Elements belonging to multisets are represented as pairs</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>C. R.</given-names>
            <surname>Aberger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Olukotun</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Re</surname>
          </string-name>
          .
          <article-title>Emptyheaded: A relational engine for graph processing</article-title>
          .
          <source>In Proceedings of the 2016 International Conference on Management of Data, SIGMOD '16</source>
          , pages
          <fpage>431</fpage>
          {
          <fpage>446</fpage>
          , New York, NY, USA,
          <year>2016</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Atre</surname>
          </string-name>
          .
          <article-title>Left Bit Right: For SPARQL Join Queries with OPTIONAL Patterns (Left-outer-joins)</article-title>
          .
          <source>In SIGMOD Conference</source>
          , pages
          <fpage>1793</fpage>
          {
          <year>1808</year>
          . ACM,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Atre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Chaoji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Zaki</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Hendler. Matrix</surname>
          </string-name>
          <article-title>"bit" loaded: A scalable lightweight join query processor for rdf data</article-title>
          .
          <source>In Proceedings of the 19th International Conference on World Wide Web, WWW '10</source>
          , pages
          <fpage>41</fpage>
          {
          <fpage>50</fpage>
          , New York, NY, USA,
          <year>2010</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>P.</given-names>
            <surname>Atzeni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ceri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Paraboschi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Torlone</surname>
          </string-name>
          .
          <source>Database Systems - Concepts</source>
          ,
          <source>Languages and Architectures. McGraw-Hill</source>
          ,
          <volume>1</volume>
          <fpage>edition</fpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>P.</given-names>
            <surname>Atzeni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ceri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Paraboschi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Torlone</surname>
          </string-name>
          .
          <article-title>Basi di dati. Modelli e linguaggi di interrogazione</article-title>
          .
          <source>McGraw-Hill, Milan</source>
          ,
          <volume>3</volume>
          <fpage>edition</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>G.</given-names>
            <surname>Bergami</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Magnani</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Montesi</surname>
          </string-name>
          .
          <article-title>On joining graphs</article-title>
          .
          <source>CoRR, abs/1608.05594</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Bornea</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dolby</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kementsietsidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Srinivas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Dantressangle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Udrea</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B.</given-names>
            <surname>Bhattacharjee</surname>
          </string-name>
          .
          <article-title>Building an e cient rdf store over a relational database</article-title>
          .
          <source>In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, SIGMOD '13</source>
          , pages
          <fpage>121</fpage>
          {
          <fpage>132</fpage>
          , New York, NY, USA,
          <year>2013</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Das</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Srinivasan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Perry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. I.</given-names>
            <surname>Chong</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Banerjee</surname>
          </string-name>
          .
          <article-title>A tale of two graphs: Property graphs as RDF in oracle</article-title>
          .
          <source>In Proceedings of the 17th International Conference on Extending Database Technology, EDBT</source>
          <year>2014</year>
          , Athens, Greece, March
          <volume>24</volume>
          -28,
          <year>2014</year>
          ., pages
          <volume>762</volume>
          {
          <fpage>773</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>D.</given-names>
            <surname>Dominguez-Sal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Urbon-Bayes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gimenez-Van</surname>
          </string-name>
          ~o,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gomez-Villamor</surname>
          </string-name>
          ,
          <string-name>
            <surname>N.</surname>
          </string-name>
          <article-title>Mart nez-Bazan, and</article-title>
          <string-name>
            <given-names>J. L.</given-names>
            <surname>Larriba-Pey</surname>
          </string-name>
          .
          <article-title>Survey of graph database performance on the hpc scalable graph analysis benchmark</article-title>
          .
          <source>In Proceedings of the 2010 International Conference on Web-age Information Management, WAIM'10</source>
          , pages
          <fpage>37</fpage>
          {
          <fpage>48</fpage>
          , Berlin, Heidelberg,
          <year>2010</year>
          . Springer-Verlag.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>O.</given-names>
            <surname>Erling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Averbuch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Larriba-Pey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Cha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gubichev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Prat</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.-D. Pham</surname>
            , and
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Boncz</surname>
          </string-name>
          .
          <article-title>The ldbc social network benchmark: Interactive workload</article-title>
          .
          <source>In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, SIGMOD '15</source>
          , pages
          <fpage>619</fpage>
          {
          <fpage>630</fpage>
          , New York, NY, USA,
          <year>2015</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>O.</given-names>
            <surname>Erling</surname>
          </string-name>
          and
          <string-name>
            <given-names>I.</given-names>
            <surname>Mikhailov</surname>
          </string-name>
          . Virtuoso:
          <article-title>Rdf support in a native rdbms</article-title>
          . In R. D.
          <string-name>
            <surname>Virgilio</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Giunchiglia</surname>
          </string-name>
          , and L. Tanca, editors,
          <source>Semantic Web Information Management</source>
          , pages
          <volume>501</volume>
          {
          <fpage>519</fpage>
          . Springer,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>G. H.</given-names>
            <surname>Fletcher</surname>
          </string-name>
          and
          <string-name>
            <given-names>P. W.</given-names>
            <surname>Beck</surname>
          </string-name>
          .
          <article-title>Scalable indexing of rdf graphs for e cient join processing</article-title>
          .
          <source>In Proceedings of the 18th ACM Conference on Information and Knowledge Management</source>
          ,
          <source>CIKM '09</source>
          , pages
          <fpage>1513</fpage>
          {
          <fpage>1516</fpage>
          , New York, NY, USA,
          <year>2009</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>J.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Qiu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Wang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Yang</surname>
          </string-name>
          .
          <article-title>Holistic top-k simple shortest path join in graphs</article-title>
          .
          <source>IEEE Trans. on Knowl. and Data Eng</source>
          .,
          <volume>24</volume>
          (
          <issue>4</issue>
          ):
          <volume>665</volume>
          {
          <fpage>677</fpage>
          ,
          <string-name>
            <surname>Apr</surname>
          </string-name>
          .
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>R.</given-names>
            <surname>Hammack</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Imrich</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Klavzar</surname>
          </string-name>
          .
          <article-title>Handbook of Product Graphs, Second Edition</article-title>
          . CRC Press, Inc.,
          <string-name>
            <surname>Boca</surname>
            <given-names>Raton</given-names>
          </string-name>
          , FL, USA, 2nd edition,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>J.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. J.</given-names>
            <surname>Abadi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Ren</surname>
          </string-name>
          .
          <article-title>Scalable sparql querying of large rdf graphs</article-title>
          .
          <source>PVLDB</source>
          ,
          <volume>4</volume>
          (
          <issue>11</issue>
          ):
          <volume>1123</volume>
          {
          <fpage>1134</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>J.</given-names>
            <surname>Ho</surname>
          </string-name>
          <article-title>lsch and M. Grossniklaus. An algebra and equivalences to transform graph patterns in neo4j</article-title>
          .
          <source>Fifth International Workshop on Querying Graph Structured Data</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>W.</given-names>
            <surname>Imrich</surname>
          </string-name>
          and
          <string-name>
            <given-names>S. Klavzar. Product</given-names>
            <surname>Graphs</surname>
          </string-name>
          .
          <article-title>Structure and Recognition</article-title>
          . John Wiley &amp; Sons, Inc., New York, NY, USA, 2nd edition,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>W.</given-names>
            <surname>Imrich</surname>
          </string-name>
          and
          <string-name>
            <surname>I. Peterin.</surname>
          </string-name>
          <article-title>Recognizing cartesian products in linear time</article-title>
          .
          <source>Discrete Mathematics</source>
          ,
          <volume>307</volume>
          (
          <issue>3-5</issue>
          ):
          <volume>472</volume>
          {
          <fpage>483</fpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>J.</given-names>
            <surname>Leskovec</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dasgupta</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Mahoney</surname>
          </string-name>
          .
          <article-title>Community structure in large networks: Natural cluster sizes and the absence of large well-de ned clusters</article-title>
          .
          <source>Internet Mathematics</source>
          ,
          <volume>6</volume>
          (
          <issue>1</issue>
          ):
          <volume>29</volume>
          {
          <fpage>123</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>J.</given-names>
            <surname>Leskovec</surname>
          </string-name>
          and
          <string-name>
            <given-names>R.</given-names>
            <surname>Sosic</surname>
          </string-name>
          .
          <article-title>Snap: A general-purpose network analysis and graph-mining library</article-title>
          .
          <source>ACM Transactions on Intelligent Systems and Technology (TIST)</source>
          ,
          <volume>8</volume>
          (
          <issue>1</issue>
          ):
          <fpage>1</fpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>M.</given-names>
            <surname>Paradies</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Lehner</surname>
          </string-name>
          , and
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>Bornhovd. Graphite: An extensible graph traversal framework for relational database management systems</article-title>
          .
          <source>In Proceedings of the 27th International Conference on Scienti c and Statistical Database Management, SSDBM '15</source>
          , pages
          <issue>29:1</issue>
          {
          <fpage>29</fpage>
          :
          <fpage>12</fpage>
          , New York, NY, USA,
          <year>2015</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>J.</given-names>
            <surname>Perez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Arenas</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Gutierrez</surname>
          </string-name>
          .
          <article-title>Semantics and complexity of sparql</article-title>
          .
          <source>ACM Trans. Database Syst</source>
          .,
          <volume>34</volume>
          (
          <issue>3</issue>
          ):
          <volume>16</volume>
          :1{
          <fpage>16</fpage>
          :
          <fpage>45</fpage>
          ,
          <string-name>
            <surname>Sept</surname>
          </string-name>
          .
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>S.</given-names>
            <surname>Schuh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Chen</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Dittrich</surname>
          </string-name>
          .
          <article-title>An experimental comparison of thirteen relational equi-joins in main memory</article-title>
          .
          <source>In Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference</source>
          <year>2016</year>
          , San Francisco, CA, USA, June 26 - July 01,
          <year>2016</year>
          , pages
          <year>1961</year>
          {
          <year>1976</year>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>J.</given-names>
            <surname>Siek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.-Q.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Lumsdaine</surname>
          </string-name>
          .
          <article-title>The Boost Graph Library: User Guide</article-title>
          and
          <string-name>
            <given-names>Reference</given-names>
            <surname>Manual</surname>
          </string-name>
          .
          <string-name>
            <surname>Addison-Wesley Longman</surname>
          </string-name>
          Publishing Co., Inc., Boston, MA, USA,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>W.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Fokoue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Srinivas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kementsietsidis</surname>
          </string-name>
          , G. Hu, and
          <string-name>
            <given-names>G.</given-names>
            <surname>Xie. Sqlgraph</surname>
          </string-name>
          :
          <article-title>An e cient relational-based property graph store</article-title>
          .
          <source>In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, SIGMOD '15</source>
          , pages
          <year>1887</year>
          {
          <year>1901</year>
          , New York, NY, USA,
          <year>2015</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>P. M.</given-names>
            <surname>Weichsel</surname>
          </string-name>
          .
          <article-title>The kronecker product of graphs</article-title>
          .
          <source>Proceedings of the American Mathematical Society</source>
          ,
          <volume>13</volume>
          (
          <issue>1</issue>
          ):
          <volume>47</volume>
          {
          <fpage>52</fpage>
          ,
          <year>1962</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>P.</given-names>
            <surname>Yuan</surname>
          </string-name>
          , P. Liu,
          <string-name>
            <given-names>B.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Jin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Liu</surname>
          </string-name>
          .
          <article-title>Triplebit: A fast and compact system for large scale rdf data</article-title>
          .
          <source>Proc. VLDB Endow</source>
          .,
          <volume>6</volume>
          (
          <issue>7</issue>
          ):
          <volume>517</volume>
          {
          <fpage>528</fpage>
          , May
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>