<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>An approach to efficiently storing property graphs in relational databases</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Matthias Schmid</string-name>
          <email>Matthias.Schmid@uni-passau.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Passau Germany</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <abstract>
        <p>Graph structured data can be found in an increasing amount of use-cases. While there exists a considerable amount of solutions to store graphs in NoSQL databases, the combined storage of relationally stored data with graph structured data within the same system is not well researched. We present a relational approach to storing and processing huge property graphs, which is optimized for read-only queries. This way, all the advantages of full- edged relational database systems can be used and the seamless integration of classical relational data with graph-structured data is possible.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>The increasing appearance of data graphs reinforce the
need for adequate graph data management. To meet this
requirement of property graph storage, several graph databases
engines have been developed. The most popular1 examples
of databases with graph storage capability include Neo4j2,
Microsoft Azure Cosmos DB3 and OrientDB4. While these
graph databases o er good performance as long as the data
ts into main memory, one can not always assume that this
requirement can be ful lled. In addition the seamless
integration of those solutions with relational-based information
system remains a problem.</p>
      <p>Motivated by our own use-case, in which we need to store
huge graphs that represent buildings, RDF data that
semantically describes those buildings and relational data into a
single information system, we were looking for a
relationalbased solution that o ers e cient read-focused performance.
In SQLGraph:An E cient Relational-Based Property Graph
1from https://db-engines.com/de/ranking/graph+dbms,
if not explicitly stated, all websites were accessed in
February 2019
2https://neo4j.com/
3https://azure.microsoft.com/de-de/services/cosmos-db/
4https://orientdb.com/</p>
      <p>
        Store [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] Sun et al. introduce an e cient approach to store
property graph data in relational databases named
SQLGraph. While the original approach seems to o er good
performance and promises to also perform well on datasets
that do not t into main memory, we were able to increase
its performance for read-only queries by adapting the
adjacency list concept.
      </p>
      <p>
        The contributions of this paper are:
1. We propose a new adaption of the relational schema
presented in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ],
2. we show that the adapted schema performs better on
read-only queries
3. and we show that the new schema requires less disk
space.
2.
      </p>
    </sec>
    <sec id="sec-2">
      <title>RELATED WORK</title>
      <p>
        In this work we focus on the relational-based storage of
property graph data. Bornea et. al rst proposed the
approach of shredding graph edges into adjacency tables in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
Sun et al. base their work in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] on the previously mentioned
work of Bornea et. al and outlined a novel schema layout
for storage of property graph data in relational databases,
which is generalized from the approach to store RDF data in
relational databases. They combine the shredding of edges
into adjacency tables with the use of JSON -based attribute
storage to overcome the limitations of xed columns. To
make the retrieval of data more convenient, they propose a
translation mechanism that converts Gremlin [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] queries to
SQL queries, which can be run on the proposed relational
schema.
      </p>
      <p>
        In the eld of RDF there exist numerous benchmarking
e orts. But to the best of our knowledge the benchmarks
provided by the Linked Data Benchmark Council (LDBC)
are the only benchmarks that directly address the problem
of benchmarking property graph stores. We do not
consider graph processing frameworks like Apache Giraph5 in
our work, since we focus on the storage and retrieval of graph
data, not its parallel processing. The LDBC is an EU project
with the goal to develop benchmarks for graph structured
data. They strife to nd the acceptance benchmarks like
the TPC [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] have achieved. The Linked Data Benchmark
Council - Social Network Benchmark (LDBC-SNB) is one
approach being developed by the LDBC that uses a
generated social network graph as its data set and represents the
5http://giraph.apache.org/
data as a property graph. The benchmark suite also
provides a data set generator that can create test data with
di erent scale factors. An overview of di erent scale factors
and the corresponding number of vertices and edges is
depicted in Table 1. Works on the Social Network Benchmark
are not nished yet, but the Interactive Workload has been
released in draft stage [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. SQLGRAPH SCHEMA ADAPTATION</title>
      <p>For this paper we use the following de nition of a property
graph.</p>
      <p>De nition 1. A property graph consists of a set of
vertices and directed edges. Every edge has a label assigned
to it. Each vertex or edge can additionaly have multiple
key-value pairs assigned that serve as attributes.</p>
    </sec>
    <sec id="sec-4">
      <title>Proposed Schema</title>
      <p>
        Our approach utilizes the combined approach of relational
and JSON -based storage developed in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. We adapted the
schema of adjacency lists and as a result, our proposed
schema reduces the amount of required tables from six (as
in the original SQLGraph schema) to four tables.
      </p>
      <p>The vertex table stores the internal vertex id and the
attributes for each vertex. In our prototypical
implementation the attributes are implemented as the jsonb data type
provided by PostgreSQL.</p>
      <p>VID
1
2</p>
      <sec id="sec-4-1">
        <title>Attributes</title>
        <p>f"lastName": "Mueller",
" rstName": "David"g
f"lastName": "Choi",
" rstName": "Jae-Jin"g</p>
        <p>
          Our approach di ers from the SQLGraph schema in
respect to the de nition of the adjacency tables. In their
work Sun et al. [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] show that shredding vertex adjacency
lists into a relational schema provides a signi cant
advantage over other mechanisms, for example mechanisms that
store all edge information in a single table. To this end a
hash function has to be de ned that matches edge labels to a
respective column triple of the adjacency table. We directly
apply the approach to compute a hash function through the
use of coloring heuristics as described in [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. Note that, since
the LDBC-SNB data set includes a data model, it is
possible to compute a con ict free hash function for this speci c
data set. In the original approach in [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] the edges are split
into two tables: If only one edge of a label exists for the
vertex, it's edge-id, label and target node-id are stored in the
outgoing primary adjacency (OPA) and incoming primary
adjacency (IPA) tables respectively. By studying available
data sets and use cases (e.g. the LDBC-SNB) one can see
that it is usually not the case that only one edge of a
speci c label exists for a single node. Therefore if the vertex
has multiple outgoing edges of the same label, the
edgeids and target vertices are stored in the outgoing secondary
adjacency (OSA) and incoming secondary adjacency (ISA)
tables respectively, while the target vertex id of the OPA or
IPA is set to an arti cial value that serves as the join partner
for the OSA and ISA table. A query to receive all outgoing
or incoming neighbours can be written with the use of an
outer join. If there exists only a single edge, the outer join
will not nd a corresponding partner in the secondary table
and therefore use the data in the primary table. If more
edges do exist, the join partners will be the resulting edges.
This query works independently of the number of edges per
label and the hash function. We will present an example of
this query structure for our schema adaption, that omits the
outer join, later in this section.
        </p>
        <p>Nevertheless, this means every edge hop of a complex
query requires an (outer) join-operation between the
primary and secondary adjacency tables. This even is the case,
if only a single edge of the required type is present, since
the construction of the query should be independent of the
hash function. Because the retrieval of direct neighbours
of a vertex is one of the most important operations in any
graph application, this join poses a major bottleneck for
most queries.</p>
        <p>In our approach we eliminate the need for a join-operation
between OPA and OSA by storing all edges of one type
in the corresponding columns using arrays. Table 3 shows
an example of our adjacency tables: In the graph are two
knows-edges that point from the vertex with id 1 to vertices
2 and 3. Since in this example the knows and the likes
label are hashed to the same column triple, a second row for
vertex 1 has to be inserted. Note that this can be omitted
with a better hash function or by providing more columns.
An evaluation of optimal numbers of columns, if no hash
function with a low amount of resulting con icts for a given
amount of columns can be found, has not been part of our
research yet and will have to be addressed in the future.</p>
        <p>Since we can provide a con ict free hash function for the
LDBC-SNB data set, an outgoing neighbourhood query for
a speci c vertex can be answered by loading a single tuple
from the database. By eliminating the need of an outer join
for any edge hop, a major drawback of the original schema
is overcome.</p>
        <p>
          EID1
[
          <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
          ]
[12]
[
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>Label1</title>
        <p>knows
likes
likes</p>
      </sec>
      <sec id="sec-4-3">
        <title>EIDk</title>
        <p>[11]
null
null</p>
      </sec>
      <sec id="sec-4-4">
        <title>Labelk</title>
        <p>creator of
null
null</p>
        <p>An example for the general query structure implemented
in the PostgreSQL dialect is depicted in Listing 1. The rst
common table expression unshred edges converts the list of
tuples belonging to one vertex, of which every row contains
k edge types, into a list of rows that contains one edge type
per row. The second common table expression gather edges
then splits those tuples into a list of edges that represents a
well known edge structure.</p>
        <p>
          Analogue to the storage of vertex attributes, we store
edge attributes. As in the original schema, we do not
only store attributes in this table, but also additional
information about the edges. Namely the source vertex, the
target vertex and the label of the edge are recorded. In [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] it
is claimed, that queries that require to trace along an
variable number of edges are performed much more e ciently
when the edge table is used instead of the adjacency table.
Our preliminary evaluations support this claim.
        </p>
        <p>
          Our goal was to compare the read-only query performance
of our adaptation with the original SQLGraph schema. To
this end we used part of the Interactive Workload of the
LDBC-SNB [
          <xref ref-type="bibr" rid="ref10 ref2 ref4">2, 4, 10</xref>
          ].
        </p>
        <p>
          We generated di erent data set sizes using the generator
and performed tests with data sets that t into the main
memory of the system as well as data set sizes that are too
huge to t into main memory. To con rm the requirement of
redundant representation of edges, we de ned path queries
with di erent xed length and also path queries of variable
lengths. After we con rmed the necessity of the edge tables,
we also chose several queries provided by the Interactive
Workload of the LDBC-SNB to evaluate the performance of
our adjacency implementation against the original schema.
We chose test queries applying the following criteria:
The main pattern of the query uses paths of xed
length and use edges of known label. This criteria is
desired, because preliminary evaluations (see [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]) have
shown, that in other cases the use of the attribute table
performs better than the adjacency tables.
        </p>
        <p>The set of queries requires the use of incoming,
outgoing and undirected edges to cover all combinations of
the adjacency tables.</p>
        <p>
          Di erent lengths of paths are contained in the set of
queries to evaluate, if the di erence in path length
makes one of the two schema versions preferable.
Queries with big intermediate result sets are part of
the query set, since [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] states this type of query as a
potential bottleneck of the original approach.
        </p>
        <p>The parameters required by the queries were randomly
chosen from the generated data set les and generated
parameters. The same parameters were used for both
approaches.</p>
        <p>To make the performance of the two schemas as
comparable as possible, we rst implemented all queries for the
adapted schema. Then we replaced only the necessary
subqueries that concern the di erences between the schemas,
changing as little of the query as possible. We con rmed
the correctness of our query implementations by comparing
the results with results returned by a reference
implementation provided for Neo4j, of which correctness has been
validated.6</p>
        <p>
          We evaluated the previously chosen queries on the same
hardware using the same data set. The evaluation was
conducted on a dedicated server with two Intel Xeon 2.4GHz
CPUs (in total 16 cores), 64GB memory and a single 240GB
SSD running 64-bit Linux. We used PostgreSQL 10 on the
aforementioned server, while we ran the client program of
the benchmark on a standard laptop that connected to the
database over LAN. All queries that were performed for
performance evaluation purposes were proceeded by several
warm-up queries as in most state-of-the-art benchmarks and
also advised in [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ].
        </p>
        <p>All examples previously shown in this paper are a
simpli ed version of the test data provided by the LDBC-SNB
data generator.
6https://github.com/PlatformLab/ldbc-snbimpls/tree/master/snb-interactive-neo4j
3.2.2</p>
        <p>
          Our ndings support the need for redundant storage of
edge data in the edge attributes table as well as the
adjacency table as claimed by [
          <xref ref-type="bibr" rid="ref1 ref9">1, 9</xref>
          ]. We nd a general tendency
for queries to perform signi cantly better with the use of
adjacency tables, if the queried path is of xed length. While
this is true for paths of xed length, the opposite holds for
paths of variable length. This type of queries requires
recursive SQL queries. Recursive queries with the use of the
edge attributes table outperform any recursive query that
uses adjacency tables. In her Master's thesis Kornev [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]
conducted an extensive evaluation, that shows when to use
the edge attributes table in favor of adjacency lists.
        </p>
        <p>100
B
G
in 80
e
c
a
Sp 60
k
s
i
D
d 40
e
r
i
u
q
eR 20</p>
        <sec id="sec-4-4-1">
          <title>Required Disk Space</title>
          <p>Since we reduced the number of tables, we expected a
reduction in required storage capacity. To this end we
imported di erent sizes of generated data sets provided by
the LDBC-SNB data set generator. The generator creates
data sets according to an input scale factor (1; 3; 10; 30; : : :).
We then sampled the required storage capacity using
PostgreSQL functions. The numbers shown in Figure 2
represent the complete graph schema, including all indexes and
constraints. Due to hardware constraints we could not yet
import data sets with a scale factor greater than 10. We will
address higher scale factors in future work. Nevertheless our
ndings show, that our adaption of the schema reduces the
required storage space by over 10%. This is mainly due to
the reduced overhead by removing one table and the
associated index structure that is needed to e ciently join the
OPA/IPA and OSA/ISA tables.
3.2.4</p>
        </sec>
        <sec id="sec-4-4-2">
          <title>Query Performance</title>
          <p>
            We expected a reduction in query execution time for
retrieval queries that use the adjacency lists. To con rm this
we evaluated our approach with the use of queries de ned
by the Interactive Workload. First we chose four queries of
the short query set. This type of query usually requires the
system to evaluate a relatively low amount of vertices and
edges to compute the answer, typically the neighbours of
one entity of the data set [
            <xref ref-type="bibr" rid="ref4">4</xref>
            ]. The results shown here were
performed against the generated data set with scale factor
60
] 40
s
m
[
n
i
e
m
i
t 20
          </p>
          <p>0
10. This data set size does not t in the main memory of
the server.</p>
          <p>Figure 3 shows the results of the conducted tests in a
boxplot. The upper bound of the box shows the 75th
percentile, the line within the box shows the 50th percentile
and the lower bound of the box shows the 25th percentile,
while the whiskers show the maximal and minimal execution
time respectively. The results of the simpler queries
conrm our expectation regarding execution time. The schema
adaption improved query performance across the board for
these queries. As we expected, the more edge hops a query
performs, we can observe a bigger di erence in execution
time. Note that short query 3 (SQ3) uses an undirected
edge. Since our schema does not directly support
undirected edges, this query results in a SQL query that uses
both the incoming adjacency and outgoing adjacency tables
and therefore is similar to a 2-edge-hop.</p>
          <p>
            We then additionally conducted experiments on four
complex read-only queries of the Interactive Workload. Complex
queries touch a signi cant amount of data and often include
aggregation [
            <xref ref-type="bibr" rid="ref4">4</xref>
            ]. The chosen queries di er in the count of
required edge-hops with CQ 12 containing the highest amount
of hops.
          </p>
          <p>
            Once again our ndings con rm an increase in read
performance of our schema adaptation. Considering complex
queries the di erence in execution time becomes even more
apparent, which con rms the ndings in [
            <xref ref-type="bibr" rid="ref9">9</xref>
            ] that stated that
big intermediate join results, which is a point of focus for
the set of complex queries, can be a bottleneck for their
approach. Our approach signi cantly increases query
performance for these types of queries.
4.
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>CONCLUSION</title>
      <p>In this paper we have described our approach to store
property graph data in a relational database. Our approach
reduces the number of joins that are required for queries
that contain paths of xed length compared to earlier
approaches. We have conducted a preliminary evaluation of
our approach using part of a standardized benchmark for
property graphs, namely the LDBC-SNB. The evaluation
CQ2</p>
      <p>CQ5</p>
      <p>CQ8</p>
      <p>CQ12</p>
      <p>LDBC SNB Queries - Scale Factor 10
results show that the adaption is a more e cient version of
the database schema in regards to read-only queries. Due
to the reduction in the amount of required tables and
therefore also index structures, we were also able to reduce the
required disk space.</p>
      <p>
        Therefore we have found an approach that enables us to
e ciently store the property graph data from our use-case
in a relational database and link it with the already
existing relational data. In addition, [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] show an approach to
store RDF data in a property graph model. Thus, we can
e ciently integrate all data of our use-case in a single
relational database.
      </p>
    </sec>
    <sec id="sec-6">
      <title>FUTURE WORK</title>
      <p>Future evaluations will have to show the validity of the
approach compared to native database systems like Neo4j.
Since most NoSQL and graph database systems focus on
main memory computation, we expect those to perform
comparable or better than our approach on data sets that can
be handled in main memory. On the other hand we expect
our approach do outperform non-relational based systems
on data sets that require regular disk accesses due to their
size. These points will be addressed in future evaluations.</p>
      <p>Additionally we will evaluate the approach more
extensively using the complete LDBC-SNB Interactive Workload
on bigger data sets and more realistic server hardware. The
workload also contains queries that insert data. We expect
updates to also perform e ciently.</p>
      <p>
        One major drawback of this approach is the complexity
of the queries required to retrieve the data. To that end
systems like Neo4j o er a very abstract and convenient way
to query data with graph query languages like Cypher [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
We propose a translation mechanism that converts Cypher
queries to SQL queries that can be evaluated by our
implementation on PostgreSQL. Kornev [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] has already shown
that a translation of Cypher queries to SQL is possible.
Unfortunately the translated queries do not in all cases achieve
the e ciency that can be achieved with queries written by
an expert. Therefore more optimization possibilities for the
translation mechanism need to be explored. This will be
part of our future research.
6.
      </p>
    </sec>
    <sec id="sec-7">
      <title>ACKNOWLEDGMENTS</title>
      <p>This research was partially supported by the "Bayrische
Staatsministerium fur Wirtschaft, Energie und Technologie"
in the context of "BaBeDo" (IUK568/001) which we conduct
in partnership with VEIT Energie GmbH.
7.</p>
    </sec>
    <sec id="sec-8">
      <title>APPENDIX A. EVALUATION QUERIES: INTERACTIVE WORKLOAD</title>
      <p>Query Cypher Representation</p>
      <p>MATCH ( n : Person f i d : f i d gg) [:IS LOCATED IN] (p : Place )
RETURN</p>
      <p>n . firstName AS firstName , n . lastName AS lastName ,
SQ 1 n . birthday AS birthday , n . l o c a t i o n I P AS l o c a t i o n I p ,
n . browserUsed AS browserUsed , n . gender AS gender ,
n . c r e a t i o n D a t e AS creationDate , p . i d AS c i t y I d
MATCH ( n : Person f i d : f i d gg) [ r :KNOWS] ( f r i e n d )</p>
      <p>RETURN
SQ 3 f r i e n d . i d AS personId , f r i e n d . firstName AS firstName ,</p>
      <p>f r i e n d . lastName AS lastName , r . c r e a t i o n D a t e AS f r i e n d s h i p C r e a t i o n D a t e
ORDER BY f r i e n d s h i p C r e a t i o n D a t e DESC, t o I n t ( personId ) ASC;
MATCH (m: Message f i d : f i d gg) [:HAS CREATOR] &gt;(p : Person )</p>
      <p>RETURN
SQ 5 p . i d AS personId , p . firstName AS firstName ,</p>
      <p>p . lastName AS lastName ;
MATCH (m: Message f i d : f i d gg) &lt; [:REPLY OF] ( c : Comment)</p>
      <p>[:HAS CREATOR] &gt;(p : Person )
OPTIONAL MATCH (m) [:HAS CREATOR] &gt;(a : Person ) [ r :KNOWS] (p )
RETURN
c . i d AS commentId , c . content AS commentContent ,
c . c r e a t i o n D a t e AS commentCreationDate , p . i d AS replyAuthorId ,
SQ 7 p . firstName AS replyAuthorFirstName , p . lastName AS replyAuthorLastName ,
CASE r</p>
      <p>WHEN n u l l THEN f a l s e</p>
      <p>ELSE t r u e</p>
      <p>END AS replyAuthorKnowsOriginalMessageAuthor</p>
      <p>ORDER BY commentCreationDate DESC, t o I n t ( replyAuthorId ) ASC;</p>
      <p>MATCH ( : Person f i d : f 1 g g ) [ :KNOWS] ( f r i e n d : Person ) &lt; [:HAS CREATOR] ( message )
WHERE message . c r e a t i o n D a t e &lt;= f2g AND ( message : Post OR message : Comment)
RETURN
f r i e n d . i d AS personId , f r i e n d . firstName AS personFirstName ,
f r i e n d . lastName AS personLastName , message . i d AS messageId ,
CASE e x i s t s ( message . content )</p>
      <p>WHEN t r u e THEN message . content</p>
      <p>ELSE message . i m ag e F i l e
END AS messageContent ,</p>
      <p>message . c r e a t i o n D a t e AS messageDate
ORDER BY messageDate DESC, t o I n t ( messageId ) ASC
LIMIT f 3 g ;
MATCH ( person : Person f i d : f 1 g g ) [ :KNOWS 1 . . 2 ] ( f r i e n d : Person )</p>
      <p>&lt; [membership :HAS MEMBER] ( forum : Forum)
WHERE membership . joinDate &gt;f2g AND not ( person=f r i e n d )
WITH DISTINCT f r i e n d , forum
OPTIONAL MATCH ( f r i e n d ) &lt; [:HAS CREATOR] ( post : Post ) &lt; [:CONTAINER OF] ( forum )
WITH forum , count ( post ) AS postCount
RETURN</p>
      <p>forum . t i t l e AS forumName , postCount
ORDER BY postCount DESC, t o I n t ( forum . i d ) ASC
LIMIT f 3 g ;
MATCH ( s t a r t : Person f i d :f1gg) &lt; [:HAS CREATOR] () &lt; [:REPLY OF]</p>
      <p>(comment : Comment) [:HAS CREATOR] &gt;( person : Person )
RETURN</p>
      <p>person . i d AS personId , person . firstName AS personFirstName ,
CQ 8 person . lastName AS personLastName , comment . creationDate ,
comment . i d AS commentId , comment . content AS commentContent
ORDER BY commentCreationDate DESC, t o I n t ( commentId ) ASC
LIMIT f 2 g ;
MATCH ( : Person f i d : f 1 g g ) [ :KNOWS] ( f r i e n d : Person )
OPTIONAL MATCH
( f r i e n d ) &lt; [:HAS CREATOR] (comment : Comment) [:REPLY OF] &gt;(: Post )</p>
      <p>[:HAS TAG] &gt;( tag : Tag ) ,
( tag ) [:HAS TYPE] &gt;( t a g C l a s s : TagClass ) [:IS SUBCLASS OF 0 . . ]</p>
      <p>&gt;(baseTagClass : TagClass )
CQ 12 WHERE t a g C l a s s . name = f2g OR baseTagClass . name = f2g</p>
      <p>RETURN
f r i e n d . i d AS f r i e n d I d , f r i e n d . firstName AS friendFirstName ,
f r i e n d . lastName AS friendLastName , c o l l e c t (DISTINCT tag . name ) ,
count (DISTINCT comment ) AS count
ORDER BY count DESC, t o I n t ( f r i e n d I d ) ASC</p>
      <p>LIMIT f 3 g ;</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Bornea</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dolby</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kementsietsidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Srinivas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Dantressangle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Udrea</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B.</given-names>
            <surname>Bhattacharjee</surname>
          </string-name>
          .
          <article-title>Building an e cient rdf store over a relational database</article-title>
          .
          <source>pages 121{132</source>
          , 06
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>P. A.</given-names>
            <surname>Boncz</surname>
          </string-name>
          .
          <article-title>LDBC: benchmarks for graph and RDF data management</article-title>
          .
          <source>In 17th International Database Engineering &amp; Applications Symposium</source>
          , IDEAS '13,
          <string-name>
            <surname>Barcelona</surname>
          </string-name>
          , Spain - October 09 -
          <issue>11</issue>
          ,
          <year>2013</year>
          , pages
          <issue>1{2</issue>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>D.</given-names>
            <surname>Dominguez-Sal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Martinez-Bazan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Muntes-Mulero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Baleta</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J. L.</given-names>
            <surname>Larriba-Pey</surname>
          </string-name>
          .
          <article-title>A discussion on the design of graph database benchmarks</article-title>
          .
          <source>In Technology Conference on Performance Evaluation and Benchmarking</source>
          , pages
          <volume>25</volume>
          {
          <fpage>40</fpage>
          . Springer,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>O.</given-names>
            <surname>Erling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Averbuch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Larriba-Pey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Cha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gubichev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Prat</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.-D. Pham</surname>
            , and
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Boncz</surname>
          </string-name>
          .
          <article-title>The LDBC social network benchmark: Interactive workload</article-title>
          .
          <source>In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, SIGMOD '15</source>
          , pages
          <fpage>619</fpage>
          {
          <fpage>630</fpage>
          , New York, NY, USA,
          <year>2015</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>N.</given-names>
            <surname>Francis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Green</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Guagliardo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Libkin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Lindaaker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Marsault</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Plantikow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Rydberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Selmer</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Taylor</surname>
          </string-name>
          . Cypher:
          <article-title>An evolving query language for property graphs</article-title>
          .
          <source>In Proceedings of the 2018 International Conference on Management of Data, SIGMOD '18</source>
          , pages
          <fpage>1433</fpage>
          {
          <fpage>1445</fpage>
          , New York, NY, USA,
          <year>2018</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>L. S.</given-names>
            <surname>Kornev</surname>
          </string-name>
          . Cypher fur sqlgraph.
          <source>Master's thesis</source>
          , University of Passau,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Poess</surname>
          </string-name>
          and
          <string-name>
            <given-names>C.</given-names>
            <surname>Floyd</surname>
          </string-name>
          .
          <article-title>New tpc benchmarks for decision support and web commerce</article-title>
          .
          <source>ACM Sigmod Record</source>
          ,
          <volume>29</volume>
          (
          <issue>4</issue>
          ):
          <volume>64</volume>
          {
          <fpage>71</fpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Rodriguez</surname>
          </string-name>
          .
          <article-title>The gremlin graph traversal machine and language (invited talk)</article-title>
          .
          <source>In Proceedings of the 15th Symposium on Database Programming Languages, DBPL 2015</source>
          , pages
          <fpage>1</fpage>
          {
          <fpage>10</fpage>
          , New York, NY, USA,
          <year>2015</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>W.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Fokoue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Srinivas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kementsietsidis</surname>
          </string-name>
          , G. Hu, and
          <string-name>
            <given-names>G.</given-names>
            <surname>Xie. Sqlgraph</surname>
          </string-name>
          :
          <article-title>An e cient relational-based property graph store</article-title>
          .
          <source>In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data</source>
          , pages
          <year>1887</year>
          {
          <year>1901</year>
          . ACM,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>G.</given-names>
            <surname>Szarnyas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Prat-Perez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Averbuch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Marton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Paradies</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kaufmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Erling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. A.</given-names>
            <surname>Boncz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Haprian</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J. B.</given-names>
            <surname>Antal</surname>
          </string-name>
          .
          <article-title>An early look at the LDBC social network benchmark's business intelligence workload</article-title>
          .
          <source>In Proceedings of the 1st ACM SIGMOD Joint International Workshop on Graph Data Management Experiences &amp; Systems (GRADES)</source>
          and
          <article-title>Network Data Analytics (NDA</article-title>
          ), Houston, TX, USA, June 10,
          <year>2018</year>
          , pages
          <issue>9:1</issue>
          {9:
          <fpage>11</fpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>