<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Foundations to Query Labeled Property Graphs using SPARQL?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Olaf Hartig</string-name>
          <email>olaf.hartig@liu.se</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dept. of Computer and Information Science (IDA), Linkoping University</institution>
          ,
          <country country="SE">Sweden</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The RDF?/SPARQL? approach extends RDF and SPARQL with means to capture and to query annotations of RDF triples, which is a feature that is natively available in graph databases modeled as Labeled Property Graphs (LPGs). Hence, the approach presents a step towards making the di erent graph database models interoperable. This paper takes this step further by providing a solid theoretical foundation for converting LPGs into RDF? data and for querying LPGs using the query language SPARQL?. Regarding the latter, the contributions in this paper consider approaches that materialize the RDF? representation of the LPGs into an RDF?-enabled triplestore as well as approaches in which the queried LPGs may reside in an LPG-speci c database system.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>In addition to extending RDF and SPARQL, we envision that RDF? and
SPARQL? may also serve as the foundation of a conceptual mediator layer for
integrating RDF data and LPGs, and for using SPARQL? as a common query
language. In this context, this paper focuses on the following research question:</p>
      <p>How can the query language SPARQL? be used to query LPGs?</p>
      <p>We address this question by providing a solid theoretical foundation for
converting LPGs into RDF? data and for querying LPGs using SPARQL?. Regarding
the latter, the contributions in this paper consider approaches that materialize
the RDF? representation of the LPGs in an RDF?-enabled triplestore1 as well as
approaches in which the queried LPGs may reside in an LPG system.</p>
      <p>
        As a basis of this work we introduce a customizable mapping from the LPG
model to the RDF? data model. The idea of the mapping is simple: Every
edge (including its label) in a given LPG is represented as an ordinary RDF
triple in the resulting RDF? data; the same holds for the label of every node, as
well as for every node property. Every edge property is represented as a nested
triple that contains as its subject the triple representing the corresponding edge.
Example 1.1. An RDF? representation (given in Turtle? format [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]) that is the
result of applying the proposed mapping to the LPG in Figure 1 is given as follows:
_:b1 rdfs:label "Kubrick" .
_:b1 p:name "Stanley Kubrick" .
_:b1 p:birthyear 1928 .
_:b2 rdfs:label "Welles" .
_:b2 p:name "Orson Welles" .
_:b2 r:mentioned _:b1 .
      </p>
      <p>&lt;&lt;_:b1 r:influencedBy _:b2&gt;&gt; p:significance 0.8 .</p>
      <p>
        We emphasize that, while in this example the two nodes of the LPG have
been mapped to RDF blank nodes (as represented by the symbols _:b1 and _:b2
in the given Turtle? serialization), our proposed mapping also allows users to
specify that LPG nodes are mapped to IRIs. Similarly, the mapping gives users
the freedom to de ne patterns for generating IRIs that denote property names
and edge labels, respectively. These IRIs become the predicates of triples in the
1 Based on RDF?-to-RDF and SPARQL?-to-SPARQL mappings in our earlier work [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ],
it is even possible to use triplestores that do not yet support RDF? and SPARQL?.
resulting RDF? data (e.g., such as http://example.org/property/birthyear in
the example). Furthermore, node labels can be de ned to be mapped either to
literals (as in the example|e.g., "Kubrick") or to IRIs.
      </p>
      <p>Example 1.2. Given the concepts in this paper, the LPG in Figure 1 may also
be queried using SPARQL? queries such as the following (pre x decl. omitted).</p>
      <p>SELECT ?n ?s WHERE {</p>
      <p>?k p:name "Stanley Kubrick" .
&lt;&lt;?k r:influencedBy ?x&gt;&gt; p:significance ?s .</p>
      <p>?x p:name ?n . }
Contributions. We make the following concrete contributions in this paper:
1. We introduce a user-con gurable direct mapping from LPGs to the RDF?
data model, and we provide a formal de nition of this mapping. (Section 3)
2. We show that, for edge-unique LPGs (cf. De nition 4.3), our mapping is
information preserving, which is the desirable property of being able to
convert any resulting RDF? data back into an LPG that is equivalent to the
original LPG used as input to the mapping. (Section 4)
3. We de ne a user-con gurable formal semantics for evaluating SPARQL?
queries directly over any LPG, including LPGs that are not edge-unique.
Hence, based on this semantics, LPGs can be queried using SPARQL?
without converting them into RDF? data rst. (Section 5)
4. We show that our new LPG-speci c query semantics coincides with the</p>
      <p>
        RDF?-based semantics de ned in our earlier work [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. (also Section 5)
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Preliminaries</title>
      <p>Before focusing on the aforementioned contributions, we need to de ne the
relevant notions of RDF?/SPARQL? and the notion of LPGs that we use in this
paper. We begin with the structural part of the RDF? data model. Thereafter, we
de ne the SPARQL? query language by introducing a syntax and a formal query
semantics over the RDF? data model. In the end of this section, we de ne LPGs.
2.1</p>
      <p>
        RDF?
The RDF? data model is an extension of the RDF data model [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Consequently,
we assume three pairwise disjoint, countably in nite sets: I (IRIs), B (blank
nodes), and L (literals). As usual, a tuple (s; p; o) 2 (I [ B) I (I [ B [ L) is
an RDF triple and a set of RDF triples is called an RDF graph [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>
        RDF? extends the notion of such triples by permitting a form of nesting; i.e.,
a triple may have another triple in its rst or its third position. Such nesting may
be arbitrarily deep. Formally, an RDF? triple is de ned recursively as follows [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
De nition 2.1 (RDF? triple). An RDF? triple is a 3-tuple such that:
1. Every RDF triple is an RDF? triple.
2. Let t and t0 be RDF? triples; for every s 2 (I [ B), p 2 I and o 2 (I [ B [ L),
the tuples (t; p; o), (s; p; t) and (t; p; t0) are RDF? triples.
      </p>
      <p>To denote the (in nite) set of all RDF? triples we write T ?. Moreover, for
any RDF? triple t = (s; p; o) we write Elmts+(t) to denote the set of all the
RDF terms (IRIs, blank nodes, and literals) and all the RDF? triples that occur
within t; i.e., Elmts+(t) = fs; p; og [ x 2 Elmts+(t0) t0 2 fs; og \ T ? .
Example 2.1. Consider the two RDF? triples t1 = (ex:Bob; ex:knows; ex:Alice) and
t2 = (t1; ex:certainty; 0.8). Notice that t1 is an ordinary RDF triple (i.e., a
nonnested RDF? triple) whereas t2 is a nested RDF? triple. We have that
Elmts+(t1) = fex:Bob; ex:knows; ex:Aliceg; and</p>
      <p>Elmts+(t2) = fex:Bob; ex:knows; ex:Alice; t1; ex:certainty; 0.8g:</p>
      <p>Similar to the notion of an RDF graph, we refer to a set of RDF? triples as an
RDF? graph. For each such graph G we write T+(G) to denote the set of all RDF?
triples that recursively occur in G, i.e., T+(G) = G [ St2G Elmts+(t) \ T ? .
Example 2.2. Recall the triples t1 and t2 of Example 2.1, and consider an RDF?
graph G that only contains t2; i.e., G = ft2g. Then, it holds that T+(G) = ft1; t2g.
2.2</p>
      <p>
        SPARQL?
SPARQL? is an RDF?-aware extension of the RDF query language SPARQL. The
basic building block of SPARQL queries is a basic graph pattern (BGP ); that
is, a nite set of triple patterns, where a triple pattern is a tuple of the form
(s; p; o) 2 V [ I [ L V [ I V [ I [ L with V being a set of query variables
that is disjoint from I, B, and L, respectively. On top of BGPs, SPARQL provides
many other query operators which are de ned based on an algebra [
        <xref ref-type="bibr" rid="ref3 ref6">3,6</xref>
        ].
      </p>
      <p>
        SPARQL? extends these concepts by adding the possibility to nest triple
patterns. The resulting notion of a triple? pattern is de ned recursively as follows [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
De nition 2.2 (triple? pattern). A triple? pattern is a 3-tuple such that:
1. Any triple pattern is a triple? pattern.
2. Let tp and tp0 be triple? patterns; for every s 2 (V [ I [ L), p 2 (V [ I) and
o 2 (V [ I [ L), tuples (tp; p; o), (s; p; tp) and (tp; p; tp0) are triple? patterns.
      </p>
      <p>
        Similar to the notion of a BGP, we call any nite set of triple? patterns a
BGP?. Note that, by this de nition, every ordinary BGP is also a BGP?. While
our technical report de nes more expressive types of SPARQL? queries [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]
(including a complete extension of the W3C speci cation of SPARQL), in this
paper we focus on the BGP? fragment of SPARQL?. However, based on our earlier
work [
        <xref ref-type="bibr" rid="ref4 ref5">4,5</xref>
        ], the results in this paper trivially extend to full SPARQL?.
      </p>
      <p>
        To de ne a formal semantics of SPARQL? queries, we recall that the semantics
of SPARQL queries is de ned based on the notion of solution mappings [
        <xref ref-type="bibr" rid="ref3 ref6">3,6</xref>
        ].
For SPARQL? we extend this notion to so-called solution? mappings that may
bind variables not only to IRIs, blank nodes, or literals, but also to RDF? triples.
Hence, a solution? mapping is a partial mapping : V !7 T ? [ I [ B [ L .
      </p>
      <p>Given a solution? mapping and a triple? pattern tp, we write [tp] to denote
the triple? pattern obtained from tp by replacing the variables in tp according
to (variables that are not in dom( ) are not replaced). By extension, for any
solution? mapping and BGP? B, we de ne [B] = Stp2B [tp].</p>
      <p>Now we are ready to de ne an evaluation function that formalizes the
semantics of SPARQL? queries over RDF? graphs.</p>
      <p>De nition 2.3 (evaluation). Let B be a BGP? and G be an RDF? graph. The
evaluation of B over G, denoted by [[B]]G, is a set of solution? mappings:
[[B]]G = f j dom( ) = Stp2Bvars(tp) and [B]
T+(G) g;
where vars(tp) is the set of all variables in a triple? pattern tp = (s; p; o), i.e.,
vars(tp) = (fs; p; og \ V) [ fx 2 vars(tp0) j tp0 2 fs; og and tp0 is a triple? patterng.
Example 2.3. Consider the three triple? patterns tp1 = (ex:Bob; ex:knows; ?x),
tp2 = (tp1; ex:certainty; ?c), and tp3 = (?t; ex:certainty; ?c). Then, given the RDF?
graph G = ft2g of Example 2.2, we obtain the following query results.
[[ftp1g]]G = f g where
[[ftp2g]]G = f g where
[[ftp3g]]G = f g where
= f?x ! ex:Aliceg;
= f?x ! ex:Alice; ?c ! 0.8g;
= f?t ! t1; ?c ! 0.8g:
2.3</p>
      <p>
        Labeled Property Graphs
There exist several proposals to de ne the notion of a Labeled (or unlabeled)
Property Graph formally. For our work in this paper we adopt the following de
nition of Angles et al. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], which assumes three in nite countable sets: Labels
(labels), Props (property names), and Values (property values).
      </p>
      <p>De nition 2.4 (LPG). An LPG is a tuple (V; E; ; ; ) where:
{ V is a nite set of vertices (or nodes);
{ E is a nite set of edges such that V \ E = ;;
{ : E ! (V V ) is a total function;
{ : (V [ E) ! Labels is a total function;
{ : (V [ E) Props !7 Values is a partial function.</p>
      <p>Example 2.4. Figure 1 illustrates the LPG g = (V; E; ; ; ) such that
V = fnK; nWg;</p>
      <p>E = fem; eig;</p>
      <p>= fem ! (nW; nK); ei ! (nK; nW)g;
= fnK ! "Kubrick"; nW ! "Welles";</p>
      <p>em ! "mentioned"; ei ! "influencedBy"g; and
= f(nK; "name") ! "Stanley Kubrick"; (nK; "birthyear") ! 1928;
(nW; "name") ! "Orson Welles"; (ei; "significance") ! 0.8g:</p>
      <p>While the given de nition of an LPG does not restrict the types of property
values, in this paper we assume that all values can be converted to distinct RDF
literals. More precisely, we assume a bijective function vm : Values ! LValues such
that LValues is a set of literals; i.e., LValues L and jLValuesj = jValues j. Note that
this assumption rules out LPGs with property values that are of some form of
collection (e.g., lists). Hereafter, function vm is called the property value mapping
and its inverse is denoted by vm-1 (i.e., we have that vm-1 : LValues ! Values).
3</p>
      <sec id="sec-2-1">
        <title>Property Graph to RDF? Direct Mapping</title>
        <p>This section formalizes our proposed mapping from LPGs to RDF? graphs. We
refer to such type of a mapping as an LPG-to-RDF? mapping which, formally,
is a function that maps from the set of all LPGs to the set of all RDF? graphs.</p>
        <p>As illustrated in Example 1.1, the idea of our particular LPG-to-RDF?
mapping is to represent each node of a given LPG either by a blank node or by an
IRI; each edge is represented as an ordinary (non-nested) triple whose subject
and object are the blank node or IRI of the adjacent LPG nodes and whose
predicate is an IRI that denotes the label of the edge; moreover, node labels and node
properties are also represented as ordinary triples, and edge properties are
represented as nested triples whose subject is the triple for the corresponding edge.</p>
        <p>To formally capture the option for users to customize the mapping by
specifying patterns for generating IRIs that denote LPG nodes, labels, and property
names we introduce the notion of an LPG-to-RDF? con guration.
De nition 3.1 (LPG-to-RDF? con guration). An LPG-to-RDF? con
guration is a tuple (nm; nlm; elm; pm; ulabel) where:
{ nm is a function, called node mapping, that maps every pair (g; n) consisting
of an LPG g = (V; E; ; ; ) and a node n 2 V to a blank node or an IRI
such that for every LPG g = (V; E; ; ; ) and every pair of nodes n 2 V
and n0 2 V we have that nm(g; n) = nm(g; n0) if and only if n = n0,
{ nlm is an injective function nlm : Labels ! I [ L called node label mapping,
{ elm is an injective function elm : Labels ! I called edge label mapping,
{ pm is an injective function pm : Props ! I called property name mapping,
{ ulabel is an IRI.</p>
        <p>Example 3.1. The property name mapping assumed in Example 1.1 is a function
that, for any pn 2 Props, returns http://example.org/property/urlenc( pn) where
urlenc( pn) is the URL-encoding of the property name pn. Similarly, the edge
label mapping returns http://example.org/relationship/urlenc( `) for all ` 2 Labels.
Furthermore, the node mapping in the example returns blank nodes, the node
label mapping maps each label to a string literal, and the IRI ulabel is rdfs:label.</p>
        <p>Now we have what we need to de ne the proposed LPG-to-RDF? mapping.
De nition 3.2 (direct LPG-to-RDF? mapping). Let c = (nm; nlm; elm;
pm; ulabel) be an LPG-to-RDF? con guration. The c-speci c direct LPG-to-RDF?
mapping, denoted by dmcL2R?, is an LPG-to-RDF? mapping that, for every LPG
g = (V; E; ; ; ), maps g to an RDF? graph dmcL2R?(g) = Gnl [ Ge [ Gnp [ Gep
that consists of the following four sets of RDF? triples:</p>
        <p>Gnl =</p>
        <p>nlc;g(n) j n 2 V ;
Ge = ec;g(e) j e 2 E such that (e; pn) 2= dom( ) for all pn 2 Props ;
Gnp =</p>
        <p>npc;g(n; pn) j (n; pn) 2 dom( ) such that n 2 V ; and</p>
        <p>Gep = epc;g(e; pn) j (e; pn) 2 dom( ) such that e 2 E ;
where
{ nlc;g maps every node n 2 V to an RDF? triple
nlc;g(n) =</p>
        <p>nm(g; n); ulabel; nlm( (n)) ;
{ ec;g maps every edge e 2 E with (e) = (n; n0) to an RDF? triple
ec;g(e) =</p>
        <p>nm(g; n); elm( (e)); nm(g; n0) ;
{ npc;g maps every (n; pn) 2 dom( ) for which n 2 V to an RDF? triple
npc;g(n; pn) = nm(g; n); pm(pn); vm( (n; pn)) ;
{ epc;g maps every (e; pn) 2 dom( ) for which e 2 E to an RDF? triple
epc;g(e; pn) =</p>
        <p>ec;g(e); pm(pn); vm( (e; pn)) :
Example 3.2. As speci ed by De nition 3.2, the RDF? graphs resulting from the
proposed mapping consist of four subsets, Gnl, Ge, Gnp, and Gep. For the RDF?
graph in Example 1.1, these subsets are the following (with b1 2 B and b2 2 B):
Gnl = (b1; rdfs:label; "Kubrick"); (b2; rdfs:label; "Welles") ;
Ge = (b2; r:mentioned; b1) ;
Gnp = (b1; p:name; "Stanley Kubrick"); (b1; p:birthyear; 1928);</p>
        <p>(b2; p:name; "Orson Welles") ; and</p>
        <p>Gep = ((b1; r:in uencedBy; b2); p:signi cance; 0.8) :
4</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Fundamental Properties of the Mapping</title>
      <p>In this section we discuss several properties of the proposed mapping.</p>
      <p>First, note that our notion of an LPG-to-RDF? con guration distinguishes
between node label mappings and edge label mappings (cf. De nition 3.1). While
the IRIs returned by an edge label mapping are used as predicates of triples
(see function ec;g in De nition 3.2 and, for instance, set Ge in Example 3.2),
a node label mapping speci es the objects to use in triples that capture node
labels (see function nlc;g in De nition 3.2 and set Gnl in Example 3.2). Note
furthermore that, by the de nition of node label mappings, such objects in triples
for node labels do not need to be literals|as in the examples above|but they
may also be IRIs instead. A speci c use case for the latter is to carry over type
information when mapping LPGs whose node labels represent types and to make
the semantics of this typing explicit in the resulting RDF? data.
Example 4.1. Consider the LPG in Figure 2 in which the node labels capture the
types of the respective nodes. We may use an LPG-to-RDF? con guration with
the same nm, elm, and pm as before (cf. Example 3.1). However, as node label
mapping nlm, we may use a function that, for every label ` 2 Labels, returns
the IRI http://example.org/type/urlenc( `), and as ulabel we may use the IRI rdf:type.
Then, the result of applying the proposed mapping to the LPG in Figure 2 is
the following RDF? graph (with b1 2 B and b2 2 B):
G = f(b1; rdf:type; http://example.org/type/Film); (b1; p:name; "The Shining");
(b2; rdf:type; http://example.org/type/Director); (b2; p:name; "Stanley Kubrick");
(b2; r:directed; b1); (b2; r:produced; b1)g:
Given this resulting data and an accompanying ontology about relationships
between the types and predicates used, it is now even possible to automatically infer
additional information using standard RDFS or OWL reasoning. For instance,
we may infer that Kubrick was not only a lm director but also a lm producer.</p>
      <p>Another noteworthy (and desirable) property of the proposed mapping is
that the mapping is linear. More precisely, based on De nition 3.2 (see also
Properties 3a{3d in Note 1 below), it is easy to verify that, for every LPG g, the
size of the resulting RDF? graph dmcL2R?(g) is linear in the size of g.</p>
      <p>As a basis for discussing another property, consider the following example.
Example 4.2. Figure 3 illustrates a variation of the initial example LPG (cf.
Figure 1) in which we have two edges with the same label, connecting the same
nodes, but with di erent edge properties. The Gep subset of the RDF? graph that
the proposed mapping would return in this case contains two nested triples:</p>
      <p>Gep = (t; p:src; "http://.../Kubr.html"); (t; p:signi cance; 0.8) ;
where t denotes the triple (b1; r:in uencedBy; b2). Based on these two nested triples
it is not possible anymore to notice that this data captures two separate edges.
The mapping has collapsed the two edges into a single triple, namely, t.</p>
      <p>The limitation revealed by the example indicates that there are cases in
which applying the proposed mapping results in a loss of some information.</p>
      <p>This observation brings us to the following question: In which cases does the
mapping guarantee to preserve all the information captured in the given LPG?</p>
      <p>
        To answer this question formally, we need to de ne the information
preservation property [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] for LPG-to-RDF? mappings. Informally, this property requires
the existence of another mapping based on which it is possible to convert any
resulting RDF? graph back into an LPG that is equivalent to the original LPG.
Since, by de nition of the LPG model, the identity of nodes and edges is local to
any given LPG, we de ne equivalence of LPGs based on a structure-preserving
bijective mapping from nodes to nodes and from edges to edges.
De nition 4.1 (equivalence of LPGs). Two LPGs g = (V; E; ; ; ) and
g0 = (V 0; E0; 0; 0; 0) are equivalent, denoted by g g0, if there exists a bijection
b : (V [ E) ! (V 0 [ E0) such that the following six properties holds:
1. for every n 2 V it holds that b(n) 2 V 0;
2. for every e 2 E it holds that b(e) 2 E0;
3. for every e 2 E it holds that 0(b(e)) = (b(n); b(n0)) such that (e) = (n; n0);
4. for every x 2 (V [ E) it holds that (x) = 0(b(x));
5. for every (x; pn) 2 dom( ) it holds that (x; pn) = 0(b(x); pn);
6. for every (x; pn) 2= dom( ) it holds that (b(x); pn) 2= dom( 0).
Note that this notion of equivalence is symmetric; i.e, g0
g if and only if g
g0.
      </p>
      <p>Based on this equivalence relationship, we now can de ne the information
preservation property. The following formal de nition captures a notion of this
property that shall allow us to indicate that speci c LPG-to-RDF? mappings
have this property only for designated subsets of all possible LPGs.
De nition 4.2 (information preservation of LPG-to-RDF? mappings).
Let P be a set of LPGs. An LPG-to-RDF? mapping m is information preserving
for P if there exits a mapping m0 from RDF? graphs to LPGs such that i) m0 is
computable and ii) for every g 2 P it holds that m0(m(g)) g.</p>
      <p>We shall show that our proposed LPG-to-RDF? mapping is information
preserving for LPGs that do not contain multiple distinct edges with the same head,
the same tail, and the same label. Hereafter, such LPGs are called edge-unique.
De nition 4.3 (edge unique). An LPG (V; E; ; ; ) is edge-unique if for
every pair of edges e; e0 2 E with (e) = (e0), it holds that (e) 6= (e0) or e 6= e0.
Example 4.3. While the LPGs illustrated in Figures 1 and 2 are edge-unique,
respectively, the LPG in Figure 3 is not edge-unique.</p>
      <p>Now we have provided all the preliminaries for our rst main technical result:
Proposition 1. Let c = (nm; nlm; elm; pm; ulabel) be an LPG-to-RDF? con
guration such that ulabel 2= img(pm), where img(pm) denotes the image of the
property name mapping pm. The c-specific direct LPG-to-RDF? mapping dmcL2R? is
information preserving for the set of all LPGs that are edge-unique.
Proof (sketch). The idea to prove Proposition 1 is to construct an inverse of the
mapping dmcL2R? as required by De nition 4.2. The crux is to use the inverse
vm-1 of the property value mapping vm (cf. Section 2.3) as well as the inverses of
the injective functions nlm, elm and pm. Then, it is not di cult to construct the
required inverse mapping by taking into account the properties in Note 1 below.
Note 1. For every LPG g = (V; E; ; ; ), the following properties of the
resulting RDF? graph dmcL2R?(g) follow trivially from De nition 3.2.
Property 1. The four subsets Gnl, Ge, Gnp, and Gep are pairwise disjoint.
Property 2. If ulabel 2= img(pm), then it is possible to decide for every RDF?
triple t = (s; p; o) in the RDF? graph dmcL2R?(g) whether t 2 Gnl, t 2 Ge,
t 2 Gnp, or t 2 Gep. That is,
{ if p = ulabel, then t 2 Gnl;
{ if s 2 T ?, then t 2 Gep;
{ if p 6= ulabel, s 2= T ?, and o 2 L, then t 2 Gnp;
{ if p 6= ulabel, s 2= T ?, and o 2= L, then t 2 Ge.</p>
      <p>Property 3a. The number of triples in Gnl is equivalent to the size of V. Hence,
for every node n 2 V, there exists a unique triple in Gnl.</p>
      <p>Property 3b. If g is edge-unique, then the number of triples in Ge is equivalent
to the number of edges e 2 E for which there does not exist a pn 2 Props
such that (e; pn) 2= dom( ).</p>
      <p>Property 3c. The number of triples in Gnp is equivalent to the number of pairs
(n; pn) 2 dom( ) such that n 2 V .</p>
      <p>Property 3d. If g is edge-unique, then the number of triples in Gep is equivalent
to the number of pairs (e; pn) 2 dom( ) such that e 2 E.
tu</p>
      <p>As a nal remark we emphasize that, in practice, the limitation of edge
uniqueness may not be as limiting as it may seem. Information represented
by introducing multiple edges between nodes can also be modeled by using an
alternative, more explicit approach. For instance, the relationship captured by
each of the multiple edges may be modeled as a separate node.
5</p>
      <sec id="sec-3-1">
        <title>Property Graph-based Query Semantics for SPARQL?</title>
        <p>Our proposed LPG-to-RDF? mapping is su cient as a formal foundation for
users who aim to query LPGs using SPARQL? after rst converting the LPGs into
RDF? graphs. To also support use cases in which a conversion into RDF? graphs
is not desired or not practical, in this section we de ne a formal semantics for
evaluating SPARQL? queries directly over LPGs. Hence, this semantics provides
the foundation for implementations in which users express SPARQL? queries in
terms of a virtual RDF? view of an LPG that is stored in an LPG system.</p>
        <p>
          Our approach to formalize the new LPG-specific query semantics in this
section is to rst de ne an evaluation function for triple? patterns only. Thereafter,
we de ne the LPG-specific evaluation function for BGP? queries, which then
may easily be extended to full SPARQL? using exactly the same formalization as
used in our technical report [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] for the RDF?-based semantics (cf. Section 2.2).
        </p>
        <p>The idea of evaluating a triple? pattern tp over an LPG is to identify RDF?
triples that match tp and that exist in the virtual RDF? view of the LPG, where
this view is established by the same functions nlc;g, ec;g, npc;g and epc;g that we use
to de ne our LPG-to-RDF? mapping (De nition 3.2). The following de nition of
an LPG-based evaluation function for triple? patterns captures this idea formally.
De nition 5.1 (evaluation of triple? patterns over LPGs). Let c = (nm;
nlm; elm; pm; ulabel) be an LPG-to-RDF? con guration, tp = (s; p; o) be a triple?
pattern, and g = (V; E; ; ; ) be an LPG. The c-specific evaluation of tp over g,
denoted by [[tp]]cg, is a set of solution? mappings that is de ned as
nl = f j dom( ) = vars(tp) and there exists n 2 V s.t. [tp] = nlc;g(n)g;
e = f j dom( ) = vars(tp) and there exists e 2 E s.t. [tp] = ec;g(e)g;
np = f j dom( ) = vars(tp) and</p>
        <p>there exists (n; pn) 2 dom( ) s.t. n 2 V and [tp] = npc;g(n; pn)g;
ep = f j dom( ) = vars(tp) and</p>
        <p>there exists (e; pn) 2 dom( ) s.t. e 2 E and [tp] = epc;g(e; pn)g:
Example 5.1. Let g be the LPG in Figure 1 (as captured formally in
Example 2.4). If we use the LPG-to-RDF? con guration of Example 3.1 to evaluate
the three triple? patterns tp1 = (?x; p:name; ?n), tp2 = (?y; r:in uencedBy; ?z), and
tp3 = (?t; p:signi cance; ?s) over g, we get the following results (with b1; b2 2 B):
[[tp1]]cg = f ; 0g where</p>
        <p>= f?x ! b1; ?n ! "Stanley Kubrick"g
and 0 = f?x ! b2; ?n ! "Oscar Welles"g;
[[tp2]]cg = f g where
[[tp3]]cg = f g where
= f?y ! b1; ?z ! b2g;
= f?t ! (b1; r:in uencedBy; b2); ?s ! 0.8g:</p>
        <p>We now de ne the LPG-specific evaluation function for BGP? queries in terms
of joins of the results obtained from the triple? patterns in the given BGP?.
De nition 5.2 (LPG-specific evaluation). Let c = (nm; nlm; elm; pm; ulabel)
be an LPG-to-RDF? con guration, B = ftp1; tp2; : : : ; tpmg be a BGP?, and
g = (V; E; ; ; ) be an LPG. The c-specific evaluation of B over g, denoted
by [[B]]cg, is a set of solution? mappings that is de ned as
[[B]]cg = [[tp1]]cg on [[tp2]]cg on : : : on [[tpm]]g;
c
where the join of two sets
and</p>
        <p>
          0 of solution? mappings is de ned as usual [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]:
on
If we use the LPG-to-RDF? con guration of Example 3.1 to evaluate B over
the LPG g in Figure 1, we obtain [[B]]cg = f g where = f?p1 ! b1; ?p2 ! b2;
?n1 ! "Stanley Kubrick"; ?n2 ! "Oscar Welles"; ?s ! 0.8g with b1 2 B and b2 2 B.
        </p>
        <p>A natural question at this point is whether it makes a di erence in terms of
query results to use our new LPG-specific query semantics directly over an LPG
as opposed to converting the LPG into an RDF? graph and then using the query
semantics de ned for the RDF? data model (as introduced in Section 2.2).</p>
        <p>Our second main technical result shows that it makes no di erence. More
precisely, the following proposition shows that query results under the new
LPGspecific query semantics coincide with the query results of the RDF?-specific
semantics when applying our LPG-to-RDF? mapping in the context of the latter.
Proposition 2. For every LPG-to-RDF? con guration c, every BGP? B, and
every LPG g, it holds that [[B]]cg = [[B]]G where dmcL2R?(g) = G.</p>
        <p>Proof (sketch). To prove Proposition 2 it is su cient to show that the semantics
coincide for triple? patterns. Then, the fact that the semantics coincide for every
BGP? B = ftp1; tp2; : : : ; tpmg follows from the following equivalence that holds
for every RDF? graph G:</p>
        <p>[[B]]G = [[tp1]]G on [[tp2]]G on : : : on [[tpm]]G:
Note that this equivalence can easily be shown based on De nition 2.3 and the
de nition of the join operation for sets of solution mappings (cf. De nition 5.2).</p>
        <p>To show that the semantics coincide for every triple? pattern tp, let c be
an arbitrary LPG-to-RDF? con guration, g be an arbitrary LPG, and G be an
RDF? graph such that dmcL2R?(g) = G. We have to show that [[ftpg]]cg [[ftpg]]G
and [[ftpg]]cg [[ftpg]]G. The crux of showing both of these subset relationships
is that both the four subsets Gnl, Ge, Gnp and Gep in De nition 3.2 and the four
subsets nl, e, np and ep in De nition 5.1 are de ned in terms of the same
functions nlc;g, ec;g, npc;g and epc;g.
(1)
tu
6</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Closing Remarks</title>
      <p>Establishing a precise understanding of formal mappings and query semantics
as done in this paper is a necessary foundation to systematically develop solid,
practical approaches to make the di erent graph database models interoperable.
As a concrete next step, enabled by the work in this paper, we aim to develop
e cient algorithms that apply the proposed mapping to convert LPGs to RDF?
data and to use SPARQL? as a common query language.</p>
      <p>Acknowledgments. The author's work on this paper has been funded by the
CENIIT program at Linkoping University (project no. 17.05).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>R.</given-names>
            <surname>Angles</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Arenas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Barcelo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hogan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Reutter</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Vrgoc</surname>
          </string-name>
          .
          <article-title>Foundations of Modern Query Languages for Graph Databases</article-title>
          . ACM Comp. Surv.,
          <volume>50</volume>
          (
          <issue>5</issue>
          ),
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>R.</given-names>
            <surname>Cyganiak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Wood</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lanthaler</surname>
          </string-name>
          , G. Klyne,
          <string-name>
            <given-names>J. J.</given-names>
            <surname>Carroll</surname>
          </string-name>
          , and B.
          <source>McBride. RDF 1.1 Concepts</source>
          and
          <string-name>
            <given-names>Abstract</given-names>
            <surname>Syntax</surname>
          </string-name>
          .
          <source>W3C Recommendation</source>
          , Feb.
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>S.</given-names>
            <surname>Harris</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Seaborne</surname>
          </string-name>
          , and
          <string-name>
            <given-names>E.</given-names>
            <surname>Prud</surname>
          </string-name>
          <article-title>'hommeaux. SPARQL 1.1 Query Language</article-title>
          .
          <source>W3C Recommendation</source>
          , Mar.
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>O.</given-names>
            <surname>Hartig</surname>
          </string-name>
          .
          <article-title>Foundations of RDF? and SPARQL? { An Alternative Approach to Statement-Level Metadata in RDF</article-title>
          .
          <source>In Proc. of the 11th AMW Workshop</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>O.</given-names>
            <surname>Hartig</surname>
          </string-name>
          and
          <string-name>
            <given-names>B.</given-names>
            <surname>Thompson</surname>
          </string-name>
          .
          <article-title>Foundations of an Alternative Approach to Rei cation in RDF</article-title>
          . CoRR, abs/1406.3399,
          <year>2014</year>
          . Online: http://arxiv.org/abs/1406.3399.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>J.</given-names>
            <surname>Perez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Arenas</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Gutierrez</surname>
          </string-name>
          .
          <source>Semantics and Complexity of SPARQL. ACM Transactions on Database Systems</source>
          ,
          <volume>34</volume>
          (
          <issue>3</issue>
          ),
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>I.</given-names>
            <surname>Robinson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Webber</surname>
          </string-name>
          , and
          <string-name>
            <given-names>E.</given-names>
            <surname>Eifrem</surname>
          </string-name>
          . Graph Databases:
          <article-title>New Opportunities for Connected Data</article-title>
          .
          <string-name>
            <surname>O'Reilly Media</surname>
          </string-name>
          , Inc.,
          <source>2nd edition</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>J.</given-names>
            <surname>Sequeda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Arenas</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D. P.</given-names>
            <surname>Miranker</surname>
          </string-name>
          .
          <article-title>On Directly Mapping Relational Databases to RDF and OWL</article-title>
          .
          <source>In Proc. of the 21st World Wide Web Conf.</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>