<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Singleton Property Graph: Adding A Semantic Web Abstraction Layer to Graph Databases</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Vinh Nguyen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hong Yung Yip</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Harsh Thakkar</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Qingliang Li</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Evan Bolton</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Olivier Bodenreider</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>National Library of Medicine, National Institute of Health</institution>
          ,
          <addr-line>Maryland</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Bonn</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of South Carolina</institution>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Property graph databases provide e cient implementations of graph traversal operations, while Semantic Web technologies provide expressive symbolic representation, querying, and reasoning tasks. Despite the di erences between the goals of the two data models, they do share similar graph characteristics. In this paper, we attempt to combine the bene ts of each model into a single graph abstraction layer called Singleton Property Graph (SPG). The SPG layer sits on top of the RDF and simulates the property graph model. We describe the SPG model and its queries, which are Semantic Web-compliant, to be executed inside property graph databases such as TinkerPop. We have tested the prototype and evaluated the experiments with the two datasets BKR and PubChem.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Although property graphs and RDF are the most popular graph models
supported by several graph databases, a single database engine implementing both
graph models and their query languages remains to be developed. Graph databases
such as AllegroGraph [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], OrientDB [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], and GraphDB [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] implement RDF graphs
with the SPARQL query language. Graph databases such as Neo4J [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], Apache
TinkerPop [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], and JanusGraph [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] support the property graphs with their own
native query languages, e.g., Apache TinkerPop Gremlin [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], PGQL [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], or
Cypher. Graph databases such as Amazon Neptune [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] support both graph
models, but only one model can be active for a database. In practice, we do not have
a single data model that natively support both query languages.
      </p>
      <p>Due to the similarity in the graph characteristics between the property graph
and the RDF graph, a common graph model simulating both graph models is
feasible, and it can combine the advantages of both worlds, graph databases and
Semantic Web. The simulation enables the RDF datasets and their SPARQL
queries to be loaded and executed in a property graph. This common graph
model will provide the capability to run Semantic Web tasks on top of a property
graph database and hence, provide the bridge to connect the two worlds.</p>
      <p>In this paper, we propose such a common graph model. Here we use the
example from Figure 1 as the motivating example for demonstrating our graph
model throughout the paper.
1.1</p>
    </sec>
    <sec id="sec-2">
      <title>Motivating Example</title>
      <p>
        A Property Graph (PG) is a directed labeled graph with a set of nodes and
a set of edges in which every edge is unique and connects an ordered pair of
nodes. A node represents an entity, and an edge represents a relationship
between two entities. Each node or edge has properties associated with it in the
form of key-value pairs. Figure 1 shows an example of a property graph taken
from the Apache TinkerPop Gremlin documentation [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. This graph contains six
nodes numbered 1-6 and six edges numbered 7-12. Indeed, every node or edge
has an identi er with the key id and a label with the key label in the form of
key-value pairs. For example, the node 1 actually has id: 1 and label: person.
      </p>
      <p>Next, we will present our approach to representing a property graph model
and its graph characteristics using RDF.
1.2</p>
    </sec>
    <sec id="sec-3">
      <title>Our approach</title>
      <p>Compared to the RDF graph model, the distinct characteristics of the property
graph model described above are: 1) the edges have their own properties just
like the nodes, and 2) every edge or node has a unique identi er. In the
running example, the relationship created has a property key weight showing the
contribution of each person to the creation of the software. The nodes have
identi ers 1-6 and the edges have identi ers 7 -12.</p>
      <p>
        We observed that this property model shares distinct characteristics with
the singleton property (SP) model [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Speci cally, while the PG model has a
unique identi er for each edge, the SP model also has a unique identi er for
each singleton property. Furthermore, while the PG model can have key-value
      </p>
      <p>Each blue node represents
an entity of type person or
software. A person node has
two property keys: name and
age. A software node has
two property keys: name and
lang. Each edge represents
one unique relationship knows
between two person entities
or one unique relationship
created between one person
entity and one software entity.
properties for each edge, each singleton property can also be associated with
additional metadata triples. Therefore, the similarities between the singleton
properties of the SP model and the edges of the PG model may provide the
foundation for developing a common data model between them. Here we show
how edge number 9 in Figure 1 can be represented in the SP model with URIs
created by concatenating the label and the id of each node as follows:
T1 : person#1 created#9 software#3 .</p>
      <p>T2 : created#9 singletonPropertyOf created .</p>
      <p>T3 : created#9 weight 0.4 .</p>
      <p>
        Although the SP model can represent the PG edges intuitively as shown
above, its SPARQL query pattern ?sub ?sp ?obj . (TP1) and
?sp singletonPropertyOf ?p . (TP2) cannot be used to e ciently traverse
this PG model. The singleton properties are unknown in most cases and are
represented as variables in this SP query pattern. Because the singleton
properties are usually unknown, if they are used to query the edges of the PG model,
the PG traversal algorithm's performance may su er severely because of the
all-variable triple pattern ?sub ?sp ?obj . [
        <xref ref-type="bibr" rid="ref15 ref16">15,16</xref>
        ].
      </p>
      <p>Furthermore, a singleton property can be associated with a metadata value
which turns out to be another entity or node. For example, in the SP patterns
with ?sp derives_from PMID_1 . (TP3) and PMID_1 type Article . (TP4),
the singleton property ?sp is associated with the metadata value PMID_1 (in
TP3), and this metadata value is also an entity of Article (in TP4). This feature
makes the SP model more expressive, but unfortunately it is not supported in
the PG model. A PG edge can only take the property value from a data type; it
does not accept another entity node like the PMID_1. As a result, the PG model
cannot support the join between the edge's property values and the nodes to
simulate the join between the singleton property's metadata value PMID_1 (in
TP3) and the subject PMID_1 (in TP4).</p>
      <p>Therefore, to develop a common graph model for both RDF and PG models
and their query languages, we identify three requirements: (R1) consider the
intrinsic similarities between the singleton properties and the PG edges, (R2)
resolve the potential degraded performance caused by the SP all-variable query
pattern (in TP1) applied to the PG whole-graph traversals, and (R3) enable
support for the singleton property's additional metadata values as entity nodes
(in TP3 and TP4).
1.3</p>
    </sec>
    <sec id="sec-4">
      <title>Our contribution</title>
      <p>In this paper, we propose the SPG, a common graph model that meets the three
requirements analyzed above. Our contribution for the SPG model includes:
{ a graph model as abstraction graph layer on top of the RDF singleton
property that can simulate the two distinct characteristics of the PG model,
{ a graph query pattern that can express the PG traversals to the key-value
properties of the nodes and edges, a SPARQL-compliant querying
mechanism that can be executed in PG databases, and
{ an implementation of this SPG model for two use cases, BKR and PubChem.</p>
      <p>Two SPG models with their sets of SPG queries generated from the BKR
and PubChem inputs are loaded and evaluated in the PG databases.</p>
      <p>The rest of the paper is organized as follows. Section 2 describes our SPG
model. Section 3 describes the SPG queries and the SPARQL-compliant querying
mechanism with two use cases from the BKR and PubChem datasets. Section 4
demonstrates the feasibility of our implementation for representing and querying
the SPG model in the PG databases such as Apache TinkerPop and Neo4j. We
provide the related work in Section 5 and conclude with Section 6.
2</p>
      <sec id="sec-4-1">
        <title>Singleton Property Graph Model</title>
        <p>Here we explain how the SPG model can be constructed to be compatible with
both the RDF and PG models and to meet the three requirements analyzed in
Section 1.2.</p>
        <p>Given the motivating example from the property graph in Figure 1 , the SP
triples T1; T2; and T3 annotate the semantics of the edge property using the SP
model. As this annotation is straightforward, Requirement R1 can be met easily
with the adoption of the SP model as the foundation for the new common model
SPG.</p>
        <p>Here we address the Requirements R2 and R3 for the new SPG model.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Mapping PG Edges and Singleton Properties to SPG Property</title>
      <p>Nodes. We observe that the two issues discussed in Requirements R2 and R3
only occur when the PG edges and the singleton properties are mapped into the
edges of a basic graph. In other words, mapping the SP and the PG edges into
the edges of a graph is the cause of the two issues.</p>
      <p>
        If we do not map the PG edges and SPs into the edges of a graph, indeed, we
are left with another choice, which is to map them to the nodes of that graph. We
have explored this choice in our prior work [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. This choice is irregular because
we are used to the idea that properties are equivalent to edges or links connecting
the nodes. However, here we need to justify the nature of these PG edges and
SPs. On the other hand, we also investigate this case to verify if mapping the
PG edges and SPs to nodes will resolve the two issues.
      </p>
      <p>First, comparing the edges and the nodes in a PG, we observe that both
of them share the same characteristic that both of them can carry their own
properties. However, the edges carry one extra connectivity characteristic that
the nodes do not. In the SP triple, the subject/object and the singleton property
also share the same characteristic that all of them can be asserted in any triple.
The singleton property itself can also carry the unique connection between the
subject and the object. Therefore, from this point of view, we believe that the
PG edges and SPs do carry the characteristics of both nodes and edges of a
graph, and it is reasonable to map them to a special type of nodes which we
refer to as property nodes.</p>
      <p>Second, if the mapping is to the nodes, then we have all three disconnected
nodes. Requirement R3 is satis ed because PG nodes can be connected to other
nodes via edges by the design of the PG model. Here we show how Requirement
R2 with all-variable SP query pattern can be address indirectly.</p>
      <p>For the three disconnected nodes, we create the rst edge with id: e1 and
label: in connecting the rst and the second nodes, and the second edge with
id: e2 and label: out connecting the second node and the third node as shown
in Figure 2. The second node is the property node, and it carries the properties
from the original PE edge. If the second node is mapped from the singleton
property, then its id has the UUID of the SP, and its label has the value from
the generic property. In either case, the property node and the two edges e1
and e2 always have a label. When the query for the SP pattern is formed, no
variable is needed for the predicate, and that resolves Requirement R2 in the
SP all-variable query pattern. Section 3 will discuss this issue in more detail.
Therefore, mapping the PG edges and SPs into property nodes satis es the two
remaining Requirements R2 and R3.</p>
      <p>As a consequence, the resulting graph meets the three requirements for a
common graph model. This resulting graph is called the SPG.</p>
      <p>Loading and Querying SPG Model in Property Graphs
The SPG model described previously is compliant with the RDF representation,
and the SPG queries can be expressed in SPARQL. However, here we focus
on the implementation of the SPG model and the execution of SPG queries in
property graph databases.</p>
      <p>We start this section by showing how the SPG model is implemented in the
two datasets, PubChem and BKR. We then explain how the SPG queries are
constructed and executed.</p>
    </sec>
    <sec id="sec-6">
      <title>Similarity Scores in the PubChem</title>
      <p>
        We collected the data generated by PubChem 3-D similarity algorithm 4,
measuring two similar compounds using 3-D Shape and Color Tanimoto scores [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
This repository contains 16995 zipped les of the total size 798 GB. We
generated a small portion of this PubChem 3D similarity scores by ltering the les
with all rows that have both ST (shape) and CT (color) scores greater than or
equal to 90.
      </p>
      <p>Given a pair of compounds CUI_1 and CUI_2, we represent the similarity
scores between them as the has ST score and the has CT score. We created a
singleton property has_sim_score between the two compounds and associate
with it the two meta scores. We loaded the PubChem 3-D similarity scores into
two models, M0 and M1 datasets. The di erence between the PubChem-M0 and
PubChem-M1 datasets is that the PubChem-M0 maps the SPs to edges while
PubChem-M1 maps the SPs to property nodes as shown in Figure 3.</p>
      <p>We provide the PubChem-M0 to show the limitation of the SPARQL queries
if not using our SPG model. The SPARQL query in this model cannot access
the key-value of the edges as we pointed out in Requirement R2.
4 ftp://ftp.ncbi.nlm.nih.gov/pubchem/Compound_3D/similar_conformers/</p>
    </sec>
    <sec id="sec-7">
      <title>Triple Provenance in the BKR</title>
      <p>
        BKR is a biomedical knowledge repository containing over 30 million semantic
predications extracted from PubMed abstracts and the Uni ed Medical
Language System (UMLS) [
        <xref ref-type="bibr" rid="ref11 ref14">11,14</xref>
        ]. We collect the original BKR dataset from [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
It represents the semantic predications using the SP model in NTriple format.
      </p>
      <p>Given a semantic predication (C0007028, PART OF, C0026969) extracted
from the PubMed abstract PUBMED 99992, we represent it in the form of
singleton property as follows.</p>
      <p>C0007028 PART OF#1 C0026969 .</p>
      <p>PART OF#1 singletonPropertyOf PART OF .</p>
      <p>PART OF#1 derives from PUBMED 99992 .</p>
      <p>We transformed this SP dataset into the SPG representation using two
models, BKR-M1 and BKR-M2 as shown in Figure 4. The di erence between the
two models is that in the BKR-M1, we map the singleton properties to a set of
property nodes, and the source of the semantic predication is represented as a
key-value pair of the property node. Meanwhile, in the BKR-M2, we map the
source of the semantic predication to another node and provide additional
information about that node, such as the publication date. This BKR-M2 model
demonstrates the support for Requirement R3 from Section 1.2.
3.3</p>
    </sec>
    <sec id="sec-8">
      <title>Querying SPG Model in Property Graphs</title>
      <p>
        We loaded the two PubChem and BKR datasets to the Neo4J database using the
SPG's M1 and M2 models as shown in Figure 3 and Figure 4, respectively. These
models can be queried using SPARQL-compliant SPG queries associated with
each model. The SPG queries are executed by using the Sparql-gremlin plugin
[
        <xref ref-type="bibr" rid="ref15 ref16">15,16</xref>
        ] to translate a SPARQL 1.0 query into a Gremlin query that is supported
by property graph databases like TinkerPop or Neo4J. This plugin prede nes
a set of SPARQL 1.0 query patterns for traversing the PG and accessing the
key-value properties of a node. The predicates in these query patterns have two
parts, a pre x e: or v: following by a key. The pre x e: is for traversing to the
edges having the matching key and the pre x v: is for retrieving the value for
the key from the same node.
      </p>
    </sec>
    <sec id="sec-9">
      <title>SPARQL-compliant SPG query</title>
      <p>For every SPG node triple t = (vi; ve; vj), fSP G(vi; ve; vj) = (ei; eo), the
node triple is connected by the pair of (in, out) edges. The subject node vi is
connected to the property node ve by the label:in edge ei, and the property
node ve is connected to the object node vj by the label:out edge eo. Therefore,
the common SPG pattern for accessing any SPG node triple will be in this form:
?sub1 e:in ?pred1 . ?pred1 e:out ?obj1 . (P1)</p>
      <p>For accessing the value from the key key_m of any node in the SPG node
triple, we use the following pattern: ?sub1 v:key_m ?val. (P2)</p>
      <p>These SPG query patterns P1 and P2 can be used in conjunction with each
other to traverse and retrieve the key-value pairs of any node in the SPG model.
All SPG queries from Figure 3 and Figure 4 use these two patterns.</p>
      <p>For example, considering the queries BKR-M2-1 and BKR-M2-2, two node
triple patterns P1 and one key-value pattern P2 are used to construct these
queries. The queries BKR-M1-1 and BKR-M1-2 share the same combination of
one node triple pattern P1 and one key-value pattern P2.</p>
      <p>For the PubChem-M1, the query PubChem-M1-1 uses only one node triple
pattern P1, and the query PubChem-M1-2 uses one node triple pattern P1 and
one key-value pattern P2. Meanwhile, the PubChem-M0 is not a SPG model. It
cannot support the access to the key-value properties of the M0's edges.</p>
      <p>Next, we report the use of the data models generated here for the
experimental evaluation.
4</p>
      <sec id="sec-9-1">
        <title>Experiments</title>
        <p>In this section we report the experiments that demonstrate the proof-of-concept
implementation of SPG models serving as a Semantic Web abstraction layer
on property graphs with queryable Semantic Web-compliant SPARQL queries.
The experiments can be grouped into three main categories: (i) importing the
SPG models into property graph database, (ii) comparing the property graph
loading and reading times, and (iii) clocking the query execution time and
evaluating the query results. In these experiments, we used the Biomedical Knowledge
Repository (BKR) and PubChem datasets described in Section 3.
4.1</p>
      </sec>
    </sec>
    <sec id="sec-10">
      <title>Experimental Setup</title>
      <p>Transform ..SNPTGriple (SP) to Transform .CSV to .SPG sTehrveerexrpuenrniminegntosnwCereentpOeSrfo7rmweidthon126a
proCpreeratytegBraKpRh-Mm1odel pro(CpwreiertahtytmegBreaKtpaRhd-aMmta2o)del pCrroepaetertPyugbraCphhemm-oMde1l PGMB 9o8f3 RNAVMMeansdtor3a.8g4e. TWBe oufseSdamNseuon4Jg
version 3.2.3 as the property graph
Greenlaetriaotnesnhoipdsefsilaensd Execute queries ldinatavberassieonan3d.4A.0paacshtehTe
ingrkaeprPhocpoGmrpeumteImpNoerot/4jinGsrearpthinto wPitehrfSoPrAmRSQPLA-GRQreLmtrlianveprlusaglin aginnds:quNeeroy4eJn-Ggirneeminlisntavlleerdsiwonith3.t4w.1o
palnudCreataellinnoddeexses on RAeNpaedaoct4hhjee-GTNrineekmoe4lrjiPnGoprpalupwghiitnohn SNPeAo4RJQ-GLr-eGmrleimnlipnlugvienrsiiosnus3e.d4.1to.
pTrhoeNeo4j Graph Apache TinkerPop vide the ability to query and traverse a</p>
      <p>Neo4J graph using Gremlin, whereas the
Fig. 5: Experiment Flowchart SPARQL-Gremlin is a compiler (also
known as Gremlinator) that transforms
SPARQL queries into Gremlin
traversals.</p>
      <p>It uses the Apache Jena SPARQL processor ARQ, which provides access to
a syntax tree of a SPARQL query. Together, they provide the necessary
interoperability interface between the Semantic Web (SPARQL) and Property Graph
(Neo4J) crossover. Next, we describe the experiment processes (Figure 5).</p>
    </sec>
    <sec id="sec-11">
      <title>4.2 Importing SPG Models into Neo4J</title>
      <p>
        The BKR SP dataset [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] consists of 33M NTriples with a le size of 17.6 GB.
This dataset was rst parsed to the SPG representation (.SPG). Two instances
of property graph models (BKR-M1 and BKR-M2) were then created from the
SPG le. Similarly, the PubChem-M1 model was also generated from its SPG
le parsed from its initial CSV les. A set of nodes and relationships les was
generated for each of the three models to facilitate the batch insert process into
Neo4J using the Neo4J-import tool. The two main criteria that determine the
insert performance are the size of the available heap memory and the page cache.
A large enough heap space is bene cial to sustain concurrent operations, whereas
a large page cache ensures most of the graph data from disk is cached in memory
to help avoid costly disk access during import. The Neo4J server is con gured
to allow a max heap and page cache size of 32 GB respectively, which are more
than adequate given the total number of nodes and relationships of our largest
model, PubChem-M1. Based on these con gurations, we timed the insert speed
with and without creating indices. Table 1 shows the corresponding tasks with
results for BKR-M1, BKR-M2, and PubChem-M1.
      </p>
      <p>While the SPG representation preserves the same number of triples, it excels
with a le size of 6.6 GB, an overall 62.5% reduction in storage space compared to
the SP model. The BKR-M1 implementation has a total of 36M nodes, 67M
relationships, and 69M properties, whereas the BKR-M2 has a total of 73M nodes,
134M relationships, and 110M properties. The PubChem-M1 implementation
has a total of 368M nodes, 682M relationships, and 1.05B properties (Table 1).</p>
      <sec id="sec-11-1">
        <title>Generate nodes and relationships les 3 min 52 sec 7 min 16 sec 32 min 20 sec</title>
      </sec>
      <sec id="sec-11-2">
        <title>Insert into Neo4J (with indices) 2 min 11 sec 3 min 54 sec 19 min 17 sec</title>
      </sec>
      <sec id="sec-11-3">
        <title>Insert into Neo4J (without indices)</title>
      </sec>
      <sec id="sec-11-4">
        <title>Final database size</title>
        <p>Discussion. Given that the nal BKR-M2 database is twice the size of
BKRM1, the di erence between the insert performances is relatively marginal. Two
plausible reasons are the NVMe drives set-up that read 3 GB/s and write at
1 GB/s, and the optimizations (heap memory and page cache) con gured on
Neo4J server.
4.3</p>
      </sec>
    </sec>
    <sec id="sec-12">
      <title>Loading and Traversing Neo4J Property Graph on Apache</title>
    </sec>
    <sec id="sec-13">
      <title>TinkerPop Gremlin</title>
      <p>Apache TinkerPop Gremlin is used in conjunction with the Neo4J-Gremlin
and SPARQL-Gremlin plugins to provide the functionality of running SPARQL
queries over a property graph database, since Neo4J does not natively support
SPARQL query language. The Neo4J-Gremlin plugin is used to provide
APIlevel access to the BKR-M1, BKR-M2, and PubChem-M1 databases created in
Section 4.2. The plugin is con gured with the same con gurations as the Neo4J
server to ensure consistency. Finally, the time taken to read and load the graph
into Apache TinkerPop were 4.35, 9.46, and 9.83 seconds for BKR-M1,
BKRM2, and PubChem-M1, respectively.</p>
      <p>Discussion. Using the Neo4J-Gremlin plugin eliminates the additional
overhead to export the Neo4J graph as GraphML format and subsequently be loaded
into Apache TinkerPop. Our experiment of loading BKR-M1 as GraphML
format into Apache TinkerPop took hours due to the plausible need to reconstruct
the nodes and relationships as well as their properties from scratch. Nonetheless,
the Neo4J-Gremlin provided acceptable reading and loading times, especially for
PubChem-M1, with a relatively high number of nodes and relationships
compared to BKR-M1 and BKR-M2.</p>
      <p>BKR-M1 vs BKR-M2 vs PubChem-M1 Queries Performance</p>
      <p>SetA Q1</p>
      <p>SetA Q2</p>
      <p>SetA Q3</p>
      <p>SetA Q4</p>
      <p>SetB Q1</p>
      <p>
        SetB Q2
) 1000
s
m
(
em 750
i
T
n
ito 500
u
c
e
x
E 250
e
g
a
r
veA 0
We created a set of SPARQL-compliant queries (set A and set B) derived from
the [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] that are supported by the current SPARQL-Gremlin version 3.4.1 and
performed the queries on both BKR-M1 and BKR-M2. The queries consisted
of the basic patterns and simple functions like COUNT, FILTER, GROUP BY,
and LIMIT from SPARQL 1.0. The SPARQL queries were performed using the
SPARQL-Gremlin plugin loaded on Apache TinkerPop Gremlin. Every query
was run for 10 repetitions and started with a cold cache (by restarting the
gremlin instance) to provide a fair comparison between short and long queries without
the in uence of a warm cache from prior queries. The evaluations were quanti ed
by the corresponding average execution time per query using the native Gremlin
clock() API and the returned results (Figure 6).
      </p>
      <p>Discussion. Given that the number of nodes and relationships in BKR-M2
are twice the size of BKR-M1, the di erence between the query performances
were relatively comparable. This suggested BKR-M2 was equally e cient, but at
a higher information (metadata) gain. SetA Q4 was not applicable to BKR-M1
and PubChem-M1 as it involved metadata query which BKR-M1 lacked.
4.5</p>
    </sec>
    <sec id="sec-14">
      <title>Overall Discussion</title>
      <p>Our experiments show that the SPG approach gives a decent performance in
terms of number of triples, query size, and query execution time. The results
support our proof-of-concept that the SPG queries are indeed SPARQL-compliant
and can be used as a Semantic Web abstraction layer on top of graph databases.
Such a layer enables the support of the expressiveness and logic of semantic
technologies while providing an e cient implementation of graph traversal
operations.
5</p>
      <sec id="sec-14-1">
        <title>Related Work</title>
        <p>
          In this paper, we use the singleton property model proposed by Nguyen et al.
[
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] as the foundational model for representing our SPG model. However, as the
SP all-variable query pattern may cause entire-graph traversals when applied in
a graph database, we develop a new querying mechanism for our model. In other
words, our work enhances the SP model in that our new querying mechanism
provides an alternative implementation for the SP queries.
        </p>
        <p>
          We also use the sparql-gremlin package [
          <xref ref-type="bibr" rid="ref15 ref16">15,16</xref>
          ] for translating the SPARQL
queries to the Gremlin language supported by property graph databases.
However, this package does not accept any SPARQL query other than SPARQL 1.0
with prede ned patterns for the SPARQL queries to traverse the PG and
accessing the key-value properties. It does not support the all-variable queries, and it
cannot retrieve the property of the edges. Our work di ers from this package in
that we de ne a new data model and use the structures de ned by this package
to enable the execution of the new queries for our data model. Furthermore, our
model can help the Sparql-gremlin to overcome its limitation such as all-variable
queries (in case of SP queries) and the retrieval of the edge property.
        </p>
        <p>
          For the RDF and PG models, several approaches have been proposed for
formalizing the PG model and transforming it to other data models such as
RDF, and RDF*. Hartig et al. [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] formalizes the PGs and RDF* data models
and de nes the transformations between them. Our work is di erent since we are
proposing a new graph model that is compatible with both PG and RDF models,
and hence, no transformation is needed. [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] proposes YARS as a Cypher-based
RDF serialization that is compatible with the PG databases supporting Cypher.
Our work is implemented with Gremlin and we use it to translate and execute
the SPARQL-compliant SPG queries in PG databases. Das et al. [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] simulates
the property graph model using the RDF named graphs and sub-properties for
the annotation of the triple metadata. Our work uses the SP model for the
simulation.
6
        </p>
      </sec>
      <sec id="sec-14-2">
        <title>Conclusion</title>
        <p>We have presented the SPG model and its implementation showing that this
graph model can be the common graph model for both RDF and PG models.
Our model and its implementation can also be reused for other datasets and
applications. This model is compatible with Semantic Web standards, with the
representation in the form of RDF triples and the queries expressed in SPARQL.</p>
        <p>Acknowledgement This research was supported in part by the Intramural
Research Program of the National Institutes of Health (NIH), National Library of
Medicine (NLM). This research was also supported in part by an appointment to
the National Library of Medicine Research Participation Program. This program
is administered by the Oak Ridge Institute for Science and Education through an
inter-agency agreement between the U.S. Department of Energy and the National
Library of Medicine. We are also thankful for the help from Usha Lokala.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1. Allegrograph. https://franz.com/agraph/allegrograph/. Accessed:
          <fpage>2019</fpage>
          -04-10.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2. Amazonneptune. https://aws.amazon.com/neptune/. Accessed:
          <fpage>2019</fpage>
          -04-10.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3. Graphdb. http://graphdb.ontotext.com/. Accessed:
          <fpage>2019</fpage>
          -04-10.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4. Janusgraph. https://janusgraph.org/. Accessed:
          <fpage>2019</fpage>
          -04-10.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5. Neo4j. https://www.neo4j.com/. Accessed:
          <fpage>2019</fpage>
          -04-10.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6. Orientdb. https://orientdb.com/. Accessed:
          <fpage>2019</fpage>
          -04-10.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7. Tinkerpop. http://tinkerpop.apache.org/. Accessed:
          <fpage>2019</fpage>
          -04-10.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>S.</given-names>
            <surname>Das</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Srinivasan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Perry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. I.</given-names>
            <surname>Chong</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Banerjee</surname>
          </string-name>
          .
          <article-title>A tale of two graphs: Property graphs as rdf in oracle</article-title>
          .
          <source>In EDBT</source>
          , pages
          <volume>762</volume>
          {
          <fpage>773</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>G.</given-names>
            <surname>Fu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Batchelor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dumontier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hastings</surname>
          </string-name>
          , E. Willighagen, and
          <string-name>
            <given-names>E.</given-names>
            <surname>Bolton</surname>
          </string-name>
          .
          <article-title>Pubchemrdf: towards the semantic annotation of pubchem compound and substance databases</article-title>
          .
          <source>Journal of cheminformatics</source>
          ,
          <volume>7</volume>
          (
          <issue>1</issue>
          ):
          <fpage>34</fpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>O.</given-names>
            <surname>Hartig</surname>
          </string-name>
          .
          <article-title>Reconciliation of rdf* and property graphs</article-title>
          .
          <source>arXiv preprint arXiv:1409.3288</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>V.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Bodenreider</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Sheth</surname>
          </string-name>
          .
          <article-title>Don't like rdf rei cation?: Making statements about statements using singleton property</article-title>
          .
          <source>In Proceedings of the 23rd International Conference on World Wide Web, WWW '14</source>
          , pages
          <fpage>759</fpage>
          {
          <fpage>770</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>V.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Leeka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Bodenreider</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Sheth</surname>
          </string-name>
          .
          <article-title>A formal graph model for rdf and its implementation</article-title>
          .
          <source>arXiv preprint arXiv:1606.00480</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Rodriguez</surname>
          </string-name>
          .
          <article-title>The gremlin graph traversal machine and language (invited talk)</article-title>
          .
          <source>In Proceedings of the 15th Symposium on Database Programming Languages</source>
          , pages
          <volume>1</volume>
          {
          <fpage>10</fpage>
          . ACM,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>S. S.</given-names>
            <surname>Sahoo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Bodenreider</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Parikh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Minning</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A. P.</given-names>
            <surname>Sheth</surname>
          </string-name>
          .
          <article-title>A uni ed framework for managing provenance information in translational research</article-title>
          .
          <source>BMC bioinformatics</source>
          ,
          <volume>12</volume>
          (
          <issue>1</issue>
          ):
          <fpage>461</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15. H.
          <string-name>
            <surname>Thakkar</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Punjani</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Keswani</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Lehmann</surname>
            , and
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Auer</surname>
          </string-name>
          .
          <article-title>A stitch in time saves nine{sparql querying of property graphs using gremlin traversals</article-title>
          .
          <source>arXiv preprint arXiv:1801.02911</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16. H.
          <string-name>
            <surname>Thakkar</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Punjani</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Lehmann</surname>
            , and
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Auer</surname>
          </string-name>
          .
          <article-title>Two for one: querying property graph databases using sparql via g remlinator</article-title>
          .
          <source>In Proceedings of the 1st ACM SIGMOD Joint International Workshop on Graph Data Management Experiences &amp; Systems (GRADES)</source>
          and
          <article-title>Network Data Analytics (NDA), page 12</article-title>
          . ACM,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <given-names>D.</given-names>
            <surname>Tomaszuk</surname>
          </string-name>
          .
          <article-title>Rdf data in property graph model</article-title>
          .
          <source>In Research Conference on Metadata and Semantics Research</source>
          , pages
          <volume>104</volume>
          {
          <fpage>115</fpage>
          . Springer,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18. O. van
          <string-name>
            <surname>Rest</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Hong</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          <string-name>
            <surname>Meng</surname>
            , and
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Cha</surname>
          </string-name>
          .
          <article-title>Pgql: a property graph query language</article-title>
          .
          <source>In Proceedings of the Fourth International Workshop on Graph Data Management Experiences and Systems, page 7. ACM</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>