Introduction

Foundations to Query Labeled Property Graphs using SPARQL?

Olaf Hartig

olaf.hartig@liu.se 0 0 Dept. of Computer and Information Science (IDA), Linkoping University , Sweden

The RDF?/SPARQL? approach extends RDF and SPARQL with means to capture and to query annotations of RDF triples, which is a feature that is natively available in graph databases modeled as Labeled Property Graphs (LPGs). Hence, the approach presents a step towards making the di erent graph database models interoperable. This paper takes this step further by providing a solid theoretical foundation for converting LPGs into RDF? data and for querying LPGs using the query language SPARQL?. Regarding the latter, the contributions in this paper consider approaches that materialize the RDF? representation of the LPGs into an RDF?-enabled triplestore as well as approaches in which the queried LPGs may reside in an LPG-speci c database system.

Introduction

In addition to extending RDF and SPARQL, we envision that RDF? and SPARQL? may also serve as the foundation of a conceptual mediator layer for integrating RDF data and LPGs, and for using SPARQL? as a common query language. In this context, this paper focuses on the following research question:

How can the query language SPARQL? be used to query LPGs?

We address this question by providing a solid theoretical foundation for converting LPGs into RDF? data and for querying LPGs using SPARQL?. Regarding the latter, the contributions in this paper consider approaches that materialize the RDF? representation of the LPGs in an RDF?-enabled triplestore1 as well as approaches in which the queried LPGs may reside in an LPG system.

As a basis of this work we introduce a customizable mapping from the LPG model to the RDF? data model. The idea of the mapping is simple: Every edge (including its label) in a given LPG is represented as an ordinary RDF triple in the resulting RDF? data; the same holds for the label of every node, as well as for every node property. Every edge property is represented as a nested triple that contains as its subject the triple representing the corresponding edge. Example 1.1. An RDF? representation (given in Turtle? format [ 5 ]) that is the result of applying the proposed mapping to the LPG in Figure 1 is given as follows: _:b1 rdfs:label "Kubrick" . _:b1 p:name "Stanley Kubrick" . _:b1 p:birthyear 1928 . _:b2 rdfs:label "Welles" . _:b2 p:name "Orson Welles" . _:b2 r:mentioned _:b1 .

<<_:b1 r:influencedBy _:b2>> p:significance 0.8 .

We emphasize that, while in this example the two nodes of the LPG have been mapped to RDF blank nodes (as represented by the symbols _:b1 and _:b2 in the given Turtle? serialization), our proposed mapping also allows users to specify that LPG nodes are mapped to IRIs. Similarly, the mapping gives users the freedom to de ne patterns for generating IRIs that denote property names and edge labels, respectively. These IRIs become the predicates of triples in the 1 Based on RDF?-to-RDF and SPARQL?-to-SPARQL mappings in our earlier work [ 4 ], it is even possible to use triplestores that do not yet support RDF? and SPARQL?. resulting RDF? data (e.g., such as http://example.org/property/birthyear in the example). Furthermore, node labels can be de ned to be mapped either to literals (as in the example|e.g., "Kubrick") or to IRIs.

Example 1.2. Given the concepts in this paper, the LPG in Figure 1 may also be queried using SPARQL? queries such as the following (pre x decl. omitted).

SELECT ?n ?s WHERE {

?k p:name "Stanley Kubrick" . <<?k r:influencedBy ?x>> p:significance ?s .

?x p:name ?n . } Contributions. We make the following concrete contributions in this paper: 1. We introduce a user-con gurable direct mapping from LPGs to the RDF? data model, and we provide a formal de nition of this mapping. (Section 3) 2. We show that, for edge-unique LPGs (cf. De nition 4.3), our mapping is information preserving, which is the desirable property of being able to convert any resulting RDF? data back into an LPG that is equivalent to the original LPG used as input to the mapping. (Section 4) 3. We de ne a user-con gurable formal semantics for evaluating SPARQL? queries directly over any LPG, including LPGs that are not edge-unique. Hence, based on this semantics, LPGs can be queried using SPARQL? without converting them into RDF? data rst. (Section 5) 4. We show that our new LPG-speci c query semantics coincides with the

RDF?-based semantics de ned in our earlier work [ 5 ]. (also Section 5) 2

Preliminaries

Before focusing on the aforementioned contributions, we need to de ne the relevant notions of RDF?/SPARQL? and the notion of LPGs that we use in this paper. We begin with the structural part of the RDF? data model. Thereafter, we de ne the SPARQL? query language by introducing a syntax and a formal query semantics over the RDF? data model. In the end of this section, we de ne LPGs. 2.1

RDF? The RDF? data model is an extension of the RDF data model [ 2 ]. Consequently, we assume three pairwise disjoint, countably in nite sets: I (IRIs), B (blank nodes), and L (literals). As usual, a tuple (s; p; o) 2 (I [ B) I (I [ B [ L) is an RDF triple and a set of RDF triples is called an RDF graph [ 2 ].

RDF? extends the notion of such triples by permitting a form of nesting; i.e., a triple may have another triple in its rst or its third position. Such nesting may be arbitrarily deep. Formally, an RDF? triple is de ned recursively as follows [ 5 ]. De nition 2.1 (RDF? triple). An RDF? triple is a 3-tuple such that: 1. Every RDF triple is an RDF? triple. 2. Let t and t0 be RDF? triples; for every s 2 (I [ B), p 2 I and o 2 (I [ B [ L), the tuples (t; p; o), (s; p; t) and (t; p; t0) are RDF? triples.

To denote the (in nite) set of all RDF? triples we write T ?. Moreover, for any RDF? triple t = (s; p; o) we write Elmts+(t) to denote the set of all the RDF terms (IRIs, blank nodes, and literals) and all the RDF? triples that occur within t; i.e., Elmts+(t) = fs; p; og [ x 2 Elmts+(t0) t0 2 fs; og \ T ? . Example 2.1. Consider the two RDF? triples t1 = (ex:Bob; ex:knows; ex:Alice) and t2 = (t1; ex:certainty; 0.8). Notice that t1 is an ordinary RDF triple (i.e., a nonnested RDF? triple) whereas t2 is a nested RDF? triple. We have that Elmts+(t1) = fex:Bob; ex:knows; ex:Aliceg; and

Elmts+(t2) = fex:Bob; ex:knows; ex:Alice; t1; ex:certainty; 0.8g:

Similar to the notion of an RDF graph, we refer to a set of RDF? triples as an RDF? graph. For each such graph G we write T+(G) to denote the set of all RDF? triples that recursively occur in G, i.e., T+(G) = G [ St2G Elmts+(t) \ T ? . Example 2.2. Recall the triples t1 and t2 of Example 2.1, and consider an RDF? graph G that only contains t2; i.e., G = ft2g. Then, it holds that T+(G) = ft1; t2g. 2.2

SPARQL? SPARQL? is an RDF?-aware extension of the RDF query language SPARQL. The basic building block of SPARQL queries is a basic graph pattern (BGP ); that is, a nite set of triple patterns, where a triple pattern is a tuple of the form (s; p; o) 2 V [ I [ L V [ I V [ I [ L with V being a set of query variables that is disjoint from I, B, and L, respectively. On top of BGPs, SPARQL provides many other query operators which are de ned based on an algebra [ 3,6 ].

SPARQL? extends these concepts by adding the possibility to nest triple patterns. The resulting notion of a triple? pattern is de ned recursively as follows [ 5 ]. De nition 2.2 (triple? pattern). A triple? pattern is a 3-tuple such that: 1. Any triple pattern is a triple? pattern. 2. Let tp and tp0 be triple? patterns; for every s 2 (V [ I [ L), p 2 (V [ I) and o 2 (V [ I [ L), tuples (tp; p; o), (s; p; tp) and (tp; p; tp0) are triple? patterns.

Similar to the notion of a BGP, we call any nite set of triple? patterns a BGP?. Note that, by this de nition, every ordinary BGP is also a BGP?. While our technical report de nes more expressive types of SPARQL? queries [ 5 ] (including a complete extension of the W3C speci cation of SPARQL), in this paper we focus on the BGP? fragment of SPARQL?. However, based on our earlier work [ 4,5 ], the results in this paper trivially extend to full SPARQL?.

To de ne a formal semantics of SPARQL? queries, we recall that the semantics of SPARQL queries is de ned based on the notion of solution mappings [ 3,6 ]. For SPARQL? we extend this notion to so-called solution? mappings that may bind variables not only to IRIs, blank nodes, or literals, but also to RDF? triples. Hence, a solution? mapping is a partial mapping : V !7 T ? [ I [ B [ L .

Given a solution? mapping and a triple? pattern tp, we write [tp] to denote the triple? pattern obtained from tp by replacing the variables in tp according to (variables that are not in dom( ) are not replaced). By extension, for any solution? mapping and BGP? B, we de ne [B] = Stp2B [tp].

Now we are ready to de ne an evaluation function that formalizes the semantics of SPARQL? queries over RDF? graphs.

De nition 2.3 (evaluation). Let B be a BGP? and G be an RDF? graph. The evaluation of B over G, denoted by [[B]]G, is a set of solution? mappings: [[B]]G = f j dom( ) = Stp2Bvars(tp) and [B] T+(G) g; where vars(tp) is the set of all variables in a triple? pattern tp = (s; p; o), i.e., vars(tp) = (fs; p; og \ V) [ fx 2 vars(tp0) j tp0 2 fs; og and tp0 is a triple? patterng. Example 2.3. Consider the three triple? patterns tp1 = (ex:Bob; ex:knows; ?x), tp2 = (tp1; ex:certainty; ?c), and tp3 = (?t; ex:certainty; ?c). Then, given the RDF? graph G = ft2g of Example 2.2, we obtain the following query results. [[ftp1g]]G = f g where [[ftp2g]]G = f g where [[ftp3g]]G = f g where = f?x ! ex:Aliceg; = f?x ! ex:Alice; ?c ! 0.8g; = f?t ! t1; ?c ! 0.8g: 2.3

Labeled Property Graphs There exist several proposals to de ne the notion of a Labeled (or unlabeled) Property Graph formally. For our work in this paper we adopt the following de nition of Angles et al. [ 1 ], which assumes three in nite countable sets: Labels (labels), Props (property names), and Values (property values).

De nition 2.4 (LPG). An LPG is a tuple (V; E; ; ; ) where: { V is a nite set of vertices (or nodes); { E is a nite set of edges such that V \ E = ;; { : E ! (V V ) is a total function; { : (V [ E) ! Labels is a total function; { : (V [ E) Props !7 Values is a partial function.

Example 2.4. Figure 1 illustrates the LPG g = (V; E; ; ; ) such that V = fnK; nWg;

E = fem; eig;

= fem ! (nW; nK); ei ! (nK; nW)g; = fnK ! "Kubrick"; nW ! "Welles";

em ! "mentioned"; ei ! "influencedBy"g; and = f(nK; "name") ! "Stanley Kubrick"; (nK; "birthyear") ! 1928; (nW; "name") ! "Orson Welles"; (ei; "significance") ! 0.8g:

While the given de nition of an LPG does not restrict the types of property values, in this paper we assume that all values can be converted to distinct RDF literals. More precisely, we assume a bijective function vm : Values ! LValues such that LValues is a set of literals; i.e., LValues L and jLValuesj = jValues j. Note that this assumption rules out LPGs with property values that are of some form of collection (e.g., lists). Hereafter, function vm is called the property value mapping and its inverse is denoted by vm-1 (i.e., we have that vm-1 : LValues ! Values). 3

Property Graph to RDF? Direct Mapping

This section formalizes our proposed mapping from LPGs to RDF? graphs. We refer to such type of a mapping as an LPG-to-RDF? mapping which, formally, is a function that maps from the set of all LPGs to the set of all RDF? graphs.

As illustrated in Example 1.1, the idea of our particular LPG-to-RDF? mapping is to represent each node of a given LPG either by a blank node or by an IRI; each edge is represented as an ordinary (non-nested) triple whose subject and object are the blank node or IRI of the adjacent LPG nodes and whose predicate is an IRI that denotes the label of the edge; moreover, node labels and node properties are also represented as ordinary triples, and edge properties are represented as nested triples whose subject is the triple for the corresponding edge.

To formally capture the option for users to customize the mapping by specifying patterns for generating IRIs that denote LPG nodes, labels, and property names we introduce the notion of an LPG-to-RDF? con guration. De nition 3.1 (LPG-to-RDF? con guration). An LPG-to-RDF? con guration is a tuple (nm; nlm; elm; pm; ulabel) where: { nm is a function, called node mapping, that maps every pair (g; n) consisting of an LPG g = (V; E; ; ; ) and a node n 2 V to a blank node or an IRI such that for every LPG g = (V; E; ; ; ) and every pair of nodes n 2 V and n0 2 V we have that nm(g; n) = nm(g; n0) if and only if n = n0, { nlm is an injective function nlm : Labels ! I [ L called node label mapping, { elm is an injective function elm : Labels ! I called edge label mapping, { pm is an injective function pm : Props ! I called property name mapping, { ulabel is an IRI.

Example 3.1. The property name mapping assumed in Example 1.1 is a function that, for any pn 2 Props, returns http://example.org/property/urlenc( pn) where urlenc( pn) is the URL-encoding of the property name pn. Similarly, the edge label mapping returns http://example.org/relationship/urlenc( `) for all ` 2 Labels. Furthermore, the node mapping in the example returns blank nodes, the node label mapping maps each label to a string literal, and the IRI ulabel is rdfs:label.

Now we have what we need to de ne the proposed LPG-to-RDF? mapping. De nition 3.2 (direct LPG-to-RDF? mapping). Let c = (nm; nlm; elm; pm; ulabel) be an LPG-to-RDF? con guration. The c-speci c direct LPG-to-RDF? mapping, denoted by dmcL2R?, is an LPG-to-RDF? mapping that, for every LPG g = (V; E; ; ; ), maps g to an RDF? graph dmcL2R?(g) = Gnl [ Ge [ Gnp [ Gep that consists of the following four sets of RDF? triples:

Gnl =

nlc;g(n) j n 2 V ; Ge = ec;g(e) j e 2 E such that (e; pn) 2= dom( ) for all pn 2 Props ; Gnp =

npc;g(n; pn) j (n; pn) 2 dom( ) such that n 2 V ; and

Gep = epc;g(e; pn) j (e; pn) 2 dom( ) such that e 2 E ; where { nlc;g maps every node n 2 V to an RDF? triple nlc;g(n) =

nm(g; n); ulabel; nlm( (n)) ; { ec;g maps every edge e 2 E with (e) = (n; n0) to an RDF? triple ec;g(e) =

nm(g; n); elm( (e)); nm(g; n0) ; { npc;g maps every (n; pn) 2 dom( ) for which n 2 V to an RDF? triple npc;g(n; pn) = nm(g; n); pm(pn); vm( (n; pn)) ; { epc;g maps every (e; pn) 2 dom( ) for which e 2 E to an RDF? triple epc;g(e; pn) =

ec;g(e); pm(pn); vm( (e; pn)) : Example 3.2. As speci ed by De nition 3.2, the RDF? graphs resulting from the proposed mapping consist of four subsets, Gnl, Ge, Gnp, and Gep. For the RDF? graph in Example 1.1, these subsets are the following (with b1 2 B and b2 2 B): Gnl = (b1; rdfs:label; "Kubrick"); (b2; rdfs:label; "Welles") ; Ge = (b2; r:mentioned; b1) ; Gnp = (b1; p:name; "Stanley Kubrick"); (b1; p:birthyear; 1928);

(b2; p:name; "Orson Welles") ; and

Gep = ((b1; r:in uencedBy; b2); p:signi cance; 0.8) : 4

Fundamental Properties of the Mapping

In this section we discuss several properties of the proposed mapping.

First, note that our notion of an LPG-to-RDF? con guration distinguishes between node label mappings and edge label mappings (cf. De nition 3.1). While the IRIs returned by an edge label mapping are used as predicates of triples (see function ec;g in De nition 3.2 and, for instance, set Ge in Example 3.2), a node label mapping speci es the objects to use in triples that capture node labels (see function nlc;g in De nition 3.2 and set Gnl in Example 3.2). Note furthermore that, by the de nition of node label mappings, such objects in triples for node labels do not need to be literals|as in the examples above|but they may also be IRIs instead. A speci c use case for the latter is to carry over type information when mapping LPGs whose node labels represent types and to make the semantics of this typing explicit in the resulting RDF? data. Example 4.1. Consider the LPG in Figure 2 in which the node labels capture the types of the respective nodes. We may use an LPG-to-RDF? con guration with the same nm, elm, and pm as before (cf. Example 3.1). However, as node label mapping nlm, we may use a function that, for every label ` 2 Labels, returns the IRI http://example.org/type/urlenc( `), and as ulabel we may use the IRI rdf:type. Then, the result of applying the proposed mapping to the LPG in Figure 2 is the following RDF? graph (with b1 2 B and b2 2 B): G = f(b1; rdf:type; http://example.org/type/Film); (b1; p:name; "The Shining"); (b2; rdf:type; http://example.org/type/Director); (b2; p:name; "Stanley Kubrick"); (b2; r:directed; b1); (b2; r:produced; b1)g: Given this resulting data and an accompanying ontology about relationships between the types and predicates used, it is now even possible to automatically infer additional information using standard RDFS or OWL reasoning. For instance, we may infer that Kubrick was not only a lm director but also a lm producer.

Another noteworthy (and desirable) property of the proposed mapping is that the mapping is linear. More precisely, based on De nition 3.2 (see also Properties 3a{3d in Note 1 below), it is easy to verify that, for every LPG g, the size of the resulting RDF? graph dmcL2R?(g) is linear in the size of g.

As a basis for discussing another property, consider the following example. Example 4.2. Figure 3 illustrates a variation of the initial example LPG (cf. Figure 1) in which we have two edges with the same label, connecting the same nodes, but with di erent edge properties. The Gep subset of the RDF? graph that the proposed mapping would return in this case contains two nested triples:

Gep = (t; p:src; "http://.../Kubr.html"); (t; p:signi cance; 0.8) ; where t denotes the triple (b1; r:in uencedBy; b2). Based on these two nested triples it is not possible anymore to notice that this data captures two separate edges. The mapping has collapsed the two edges into a single triple, namely, t.

The limitation revealed by the example indicates that there are cases in which applying the proposed mapping results in a loss of some information.

This observation brings us to the following question: In which cases does the mapping guarantee to preserve all the information captured in the given LPG?

To answer this question formally, we need to de ne the information preservation property [ 8 ] for LPG-to-RDF? mappings. Informally, this property requires the existence of another mapping based on which it is possible to convert any resulting RDF? graph back into an LPG that is equivalent to the original LPG. Since, by de nition of the LPG model, the identity of nodes and edges is local to any given LPG, we de ne equivalence of LPGs based on a structure-preserving bijective mapping from nodes to nodes and from edges to edges. De nition 4.1 (equivalence of LPGs). Two LPGs g = (V; E; ; ; ) and g0 = (V 0; E0; 0; 0; 0) are equivalent, denoted by g g0, if there exists a bijection b : (V [ E) ! (V 0 [ E0) such that the following six properties holds: 1. for every n 2 V it holds that b(n) 2 V 0; 2. for every e 2 E it holds that b(e) 2 E0; 3. for every e 2 E it holds that 0(b(e)) = (b(n); b(n0)) such that (e) = (n; n0); 4. for every x 2 (V [ E) it holds that (x) = 0(b(x)); 5. for every (x; pn) 2 dom( ) it holds that (x; pn) = 0(b(x); pn); 6. for every (x; pn) 2= dom( ) it holds that (b(x); pn) 2= dom( 0). Note that this notion of equivalence is symmetric; i.e, g0 g if and only if g g0.

Based on this equivalence relationship, we now can de ne the information preservation property. The following formal de nition captures a notion of this property that shall allow us to indicate that speci c LPG-to-RDF? mappings have this property only for designated subsets of all possible LPGs. De nition 4.2 (information preservation of LPG-to-RDF? mappings). Let P be a set of LPGs. An LPG-to-RDF? mapping m is information preserving for P if there exits a mapping m0 from RDF? graphs to LPGs such that i) m0 is computable and ii) for every g 2 P it holds that m0(m(g)) g.

We shall show that our proposed LPG-to-RDF? mapping is information preserving for LPGs that do not contain multiple distinct edges with the same head, the same tail, and the same label. Hereafter, such LPGs are called edge-unique. De nition 4.3 (edge unique). An LPG (V; E; ; ; ) is edge-unique if for every pair of edges e; e0 2 E with (e) = (e0), it holds that (e) 6= (e0) or e 6= e0. Example 4.3. While the LPGs illustrated in Figures 1 and 2 are edge-unique, respectively, the LPG in Figure 3 is not edge-unique.

Now we have provided all the preliminaries for our rst main technical result: Proposition 1. Let c = (nm; nlm; elm; pm; ulabel) be an LPG-to-RDF? con guration such that ulabel 2= img(pm), where img(pm) denotes the image of the property name mapping pm. The c-specific direct LPG-to-RDF? mapping dmcL2R? is information preserving for the set of all LPGs that are edge-unique. Proof (sketch). The idea to prove Proposition 1 is to construct an inverse of the mapping dmcL2R? as required by De nition 4.2. The crux is to use the inverse vm-1 of the property value mapping vm (cf. Section 2.3) as well as the inverses of the injective functions nlm, elm and pm. Then, it is not di cult to construct the required inverse mapping by taking into account the properties in Note 1 below. Note 1. For every LPG g = (V; E; ; ; ), the following properties of the resulting RDF? graph dmcL2R?(g) follow trivially from De nition 3.2. Property 1. The four subsets Gnl, Ge, Gnp, and Gep are pairwise disjoint. Property 2. If ulabel 2= img(pm), then it is possible to decide for every RDF? triple t = (s; p; o) in the RDF? graph dmcL2R?(g) whether t 2 Gnl, t 2 Ge, t 2 Gnp, or t 2 Gep. That is, { if p = ulabel, then t 2 Gnl; { if s 2 T ?, then t 2 Gep; { if p 6= ulabel, s 2= T ?, and o 2 L, then t 2 Gnp; { if p 6= ulabel, s 2= T ?, and o 2= L, then t 2 Ge.

Property 3a. The number of triples in Gnl is equivalent to the size of V. Hence, for every node n 2 V, there exists a unique triple in Gnl.

Property 3b. If g is edge-unique, then the number of triples in Ge is equivalent to the number of edges e 2 E for which there does not exist a pn 2 Props such that (e; pn) 2= dom( ).

Property 3c. The number of triples in Gnp is equivalent to the number of pairs (n; pn) 2 dom( ) such that n 2 V .

Property 3d. If g is edge-unique, then the number of triples in Gep is equivalent to the number of pairs (e; pn) 2 dom( ) such that e 2 E. tu

As a nal remark we emphasize that, in practice, the limitation of edge uniqueness may not be as limiting as it may seem. Information represented by introducing multiple edges between nodes can also be modeled by using an alternative, more explicit approach. For instance, the relationship captured by each of the multiple edges may be modeled as a separate node. 5

Property Graph-based Query Semantics for SPARQL?

Our proposed LPG-to-RDF? mapping is su cient as a formal foundation for users who aim to query LPGs using SPARQL? after rst converting the LPGs into RDF? graphs. To also support use cases in which a conversion into RDF? graphs is not desired or not practical, in this section we de ne a formal semantics for evaluating SPARQL? queries directly over LPGs. Hence, this semantics provides the foundation for implementations in which users express SPARQL? queries in terms of a virtual RDF? view of an LPG that is stored in an LPG system.

Our approach to formalize the new LPG-specific query semantics in this section is to rst de ne an evaluation function for triple? patterns only. Thereafter, we de ne the LPG-specific evaluation function for BGP? queries, which then may easily be extended to full SPARQL? using exactly the same formalization as used in our technical report [ 5 ] for the RDF?-based semantics (cf. Section 2.2).

The idea of evaluating a triple? pattern tp over an LPG is to identify RDF? triples that match tp and that exist in the virtual RDF? view of the LPG, where this view is established by the same functions nlc;g, ec;g, npc;g and epc;g that we use to de ne our LPG-to-RDF? mapping (De nition 3.2). The following de nition of an LPG-based evaluation function for triple? patterns captures this idea formally. De nition 5.1 (evaluation of triple? patterns over LPGs). Let c = (nm; nlm; elm; pm; ulabel) be an LPG-to-RDF? con guration, tp = (s; p; o) be a triple? pattern, and g = (V; E; ; ; ) be an LPG. The c-specific evaluation of tp over g, denoted by [[tp]]cg, is a set of solution? mappings that is de ned as nl = f j dom( ) = vars(tp) and there exists n 2 V s.t. [tp] = nlc;g(n)g; e = f j dom( ) = vars(tp) and there exists e 2 E s.t. [tp] = ec;g(e)g; np = f j dom( ) = vars(tp) and

there exists (n; pn) 2 dom( ) s.t. n 2 V and [tp] = npc;g(n; pn)g; ep = f j dom( ) = vars(tp) and

there exists (e; pn) 2 dom( ) s.t. e 2 E and [tp] = epc;g(e; pn)g: Example 5.1. Let g be the LPG in Figure 1 (as captured formally in Example 2.4). If we use the LPG-to-RDF? con guration of Example 3.1 to evaluate the three triple? patterns tp1 = (?x; p:name; ?n), tp2 = (?y; r:in uencedBy; ?z), and tp3 = (?t; p:signi cance; ?s) over g, we get the following results (with b1; b2 2 B): [[tp1]]cg = f ; 0g where

= f?x ! b1; ?n ! "Stanley Kubrick"g and 0 = f?x ! b2; ?n ! "Oscar Welles"g; [[tp2]]cg = f g where [[tp3]]cg = f g where = f?y ! b1; ?z ! b2g; = f?t ! (b1; r:in uencedBy; b2); ?s ! 0.8g:

We now de ne the LPG-specific evaluation function for BGP? queries in terms of joins of the results obtained from the triple? patterns in the given BGP?. De nition 5.2 (LPG-specific evaluation). Let c = (nm; nlm; elm; pm; ulabel) be an LPG-to-RDF? con guration, B = ftp1; tp2; : : : ; tpmg be a BGP?, and g = (V; E; ; ; ) be an LPG. The c-specific evaluation of B over g, denoted by [[B]]cg, is a set of solution? mappings that is de ned as [[B]]cg = [[tp1]]cg on [[tp2]]cg on : : : on [[tpm]]g; c where the join of two sets and

0 of solution? mappings is de ned as usual [ 5 ]: on If we use the LPG-to-RDF? con guration of Example 3.1 to evaluate B over the LPG g in Figure 1, we obtain [[B]]cg = f g where = f?p1 ! b1; ?p2 ! b2; ?n1 ! "Stanley Kubrick"; ?n2 ! "Oscar Welles"; ?s ! 0.8g with b1 2 B and b2 2 B.

A natural question at this point is whether it makes a di erence in terms of query results to use our new LPG-specific query semantics directly over an LPG as opposed to converting the LPG into an RDF? graph and then using the query semantics de ned for the RDF? data model (as introduced in Section 2.2).

Our second main technical result shows that it makes no di erence. More precisely, the following proposition shows that query results under the new LPGspecific query semantics coincide with the query results of the RDF?-specific semantics when applying our LPG-to-RDF? mapping in the context of the latter. Proposition 2. For every LPG-to-RDF? con guration c, every BGP? B, and every LPG g, it holds that [[B]]cg = [[B]]G where dmcL2R?(g) = G.

Proof (sketch). To prove Proposition 2 it is su cient to show that the semantics coincide for triple? patterns. Then, the fact that the semantics coincide for every BGP? B = ftp1; tp2; : : : ; tpmg follows from the following equivalence that holds for every RDF? graph G:

[[B]]G = [[tp1]]G on [[tp2]]G on : : : on [[tpm]]G: Note that this equivalence can easily be shown based on De nition 2.3 and the de nition of the join operation for sets of solution mappings (cf. De nition 5.2).

To show that the semantics coincide for every triple? pattern tp, let c be an arbitrary LPG-to-RDF? con guration, g be an arbitrary LPG, and G be an RDF? graph such that dmcL2R?(g) = G. We have to show that [[ftpg]]cg [[ftpg]]G and [[ftpg]]cg [[ftpg]]G. The crux of showing both of these subset relationships is that both the four subsets Gnl, Ge, Gnp and Gep in De nition 3.2 and the four subsets nl, e, np and ep in De nition 5.1 are de ned in terms of the same functions nlc;g, ec;g, npc;g and epc;g. (1) tu 6

Closing Remarks

Establishing a precise understanding of formal mappings and query semantics as done in this paper is a necessary foundation to systematically develop solid, practical approaches to make the di erent graph database models interoperable. As a concrete next step, enabled by the work in this paper, we aim to develop e cient algorithms that apply the proposed mapping to convert LPGs to RDF? data and to use SPARQL? as a common query language.

Acknowledgments. The author's work on this paper has been funded by the CENIIT program at Linkoping University (project no. 17.05).

Angles ,

Arenas ,

Barcelo ,

Hogan ,

Reutter , and

Vrgoc . Foundations of Modern Query Languages for Graph Databases . ACM Comp. Surv., 50 ( 5 ), 2017 .

Cyganiak ,

Wood ,

Lanthaler , G. Klyne,

J. J.

Carroll , and B. McBride. RDF 1.1 Concepts and

Abstract

Syntax . W3C Recommendation , Feb. 2014 .

Harris ,

Seaborne , and

Prud 'hommeaux. SPARQL 1.1 Query Language . W3C Recommendation , Mar. 2013 .

Hartig . Foundations of RDF? and SPARQL? { An Alternative Approach to Statement-Level Metadata in RDF . In Proc. of the 11th AMW Workshop , 2017 .

Hartig and

Thompson . Foundations of an Alternative Approach to Rei cation in RDF . CoRR, abs/1406.3399, 2014 . Online: http://arxiv.org/abs/1406.3399.

Perez ,

Arenas , and

Gutierrez . Semantics and Complexity of SPARQL. ACM Transactions on Database Systems , 34 ( 3 ), 2009 .

Robinson ,

Webber , and

Eifrem . Graph Databases: New Opportunities for Connected Data . O'Reilly Media , Inc., 2nd edition , 2015 .

Sequeda ,

Arenas , and

D. P.

Miranker . On Directly Mapping Relational Databases to RDF and OWL . In Proc. of the 21st World Wide Web Conf. , 2012 .