<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>LDQL: A Language for Linked Data Queries</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Olaf Hartig</string-name>
        </contrib>
      </contrib-group>
      <abstract>
        <p>In this paper, we propose LDQL, that is, a language to query Linked Data on the Web. The novelty of LDQL is that it enables a user to express separately (i) patterns that describe the expected query result, and (ii) Web navigation paths that select the data sources to be used for computing the result. As a downside of this expressiveness, we find that for some LDQL queries, a complete execution is not possible in practice. To address this issue, we show a syntactic property based on which systems can identify queries that do not have this limitation.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>http://olafhartig.de</p>
    </sec>
    <sec id="sec-2">
      <title>1 Introduction</title>
      <p>
        In recent years an increasing amount of structured data has been published and
interlinked on the World Wide Web (WWW) in adherence to the Linked Data
principles [
        <xref ref-type="bibr" rid="ref2 ref3">2,3</xref>
        ]. These principles are based on standard Web technologies. In particular, (i) the
Hypertext Transfer Protocol (HTTP) is used to access data, (ii) HTTP-based Uniform
Resource Identifiers (URIs) are used as identifiers for entities described in the data,
and (iii) the Resource Description Framework (RDF) is used as data model. Then, any
HTTP URI in an RDF triple presents a data link that enables software clients to retrieve
more data by looking up the URI with an HTTP request. The adoption of these
principles has lead to the creation of a globally distributed dataspace: the Web of Linked Data.
      </p>
      <p>
        From a graph database perspective the Web of Linked Data can be conceived of as
two graphs of different types: First, the (virtual) union of all RDF triples in the Web of
Linked Data represents an RDF graph [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]; this graph is distributed over the documents
in the Web of Linked Data. Second, these documents constitute the nodes of a link graph
whose edges represent the aforementioned data links that connect the documents.
      </p>
      <p>
        The emergence of the Web of Linked Data makes possible an online execution of
declarative queries over up-to-date data from a virtually unbounded set of data sources,
each of which is readily accessible without any need for implementing source-specific
APIs or wrappers. This possibility has spawned research interest in approaches to query
Linked Data on the WWW as if it was a single (distributed) database. For an overview
on query execution techniques proposed in this context refer to [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
      </p>
      <p>
        While there does not exist a standard language for expressing such queries, a few
options have been proposed. In particular, a first strand of research focuses on extending
the scope of SPARQL—which is the standard query language for centralized collections
of RDF data [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]—such that an evaluation of SPARQL queries over Linked Data on
the WWW has a well-defined semantics [
        <xref ref-type="bibr" rid="ref10 ref11 ref14 ref16 ref4">4,10,11,14,16</xref>
        ]. A second strand of research
focuses on navigational languages (e.g., NautiLOD [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]). A commonality of all these
proposals is that querying the aforementioned two graphs that comprise the Web of
Linked Data (i.e., the link graph and the RDF data graph) is inherently entangled in each
of the proposed query formalisms. That is, for any query expressed in such a formalism,
the definition of query-relevant regions of the link graph and the definition of
query-relevant data from the RDF graph within the specified regions depend on one another.
      </p>
      <p>In this paper, we study the alternative approach of avoiding such an inherent
dependency. To this end, we propose a novel query language for Linked Data which we
call LDQL. In contrast to the aforementioned query formalisms, LDQL separates query
components for selecting regions of the link graph from components for specifying the
query result that has to be constructed from the data in the selected regions. For the
former, LDQL introduces the notion of a link path expression, which is a form of nested
regular expression, and for the latter we use SPARQL. More precisely, the most basic
type of LDQL queries consists of a link path expression and a SPARQL graph pattern.
Additionally, such queries can be combined using conjunctions and disjunctions, where
some subqueries may provide the starting points of navigation for other subqueries.</p>
      <p>The downside of the expressiveness provided by LDQL are queries for which a
complete execution is not possible in practice (because of the lack of a complete,
upto-date data catalog that is inherent to the WWW). To capture this issue formally, we
define a notion of Web-safeness for LDQL queries. Then, the obvious question that
arises is how to identify those LDQL queries that are Web-safe. Our contribution to
answer this question is to show a sufficient syntactic condition for Web-safeness.</p>
      <p>
        The paper is structured as follows: Section 2 introduces a data model that provides
the basis for defining the semantics of LDQL formally. In Section 3 we define LDQL
and show simple algebraic properties. Thereafter, in Section 4 we focus on
Web-safeness. Section 5 concludes the paper and sketches future work. For proofs of the formal
results in this paper we refer to the extended version of this paper [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
2
      </p>
    </sec>
    <sec id="sec-3">
      <title>Data Model</title>
      <p>
        In this section we introduce a structural data model that captures the concept of a Web
of Linked Data formally. As usual [
        <xref ref-type="bibr" rid="ref10 ref11 ref14 ref16 ref4 ref8">4,8,10,11,14,16</xref>
        ], for the definitions and analysis in
this paper, we assume that the Web is fixed during the execution of any single query.
      </p>
      <p>
        We use the RDF data model [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] as a basis for our model of a Web of Linked Data.
That is, we assume three pairwise disjoint, infinite sets U (URIs), B (blank nodes), and
L (literals). Then, an RDF triple is a tuple hs; p; oi 2 (U [ B) U (U [ B [ L). For
any RDF triple t = hs; p; oi we write uris(t) to denote the set of all URIs in t.
      </p>
      <p>Additionally, we assume another infinite set D that is disjoint from U , B, and L,
respectively. We refer to elements in this set as Linked Data documents, or documents
for short, and use them to represent the concept of Web documents from which Linked
Data can be extracted. Hence, we assume a function, say data, that maps each document
d 2 D to a finite set of RDF triples data(d) (U [ B) U (U [ B [ L) such that
the data of each document uses a unique set of blank nodes; i.e., for any pair of distinct
documents d; d0 2 D, and any two RDF triples t = hs; p; oi and t0 = hs0; p0; o0i with
t 2 data(d) and t0 2 data(d0), it holds that fs; p; og \ fs0; p0; o0g \ B = ;.</p>
      <p>Given these preliminaries, we are ready to define a Web of Linked Data.
Definition 1. A Web of Linked Data is a tuple W = hD; adoci that consists of a set
of documents D D and a partial function adoc : U ! D that is surjective.</p>
      <p>Function adoc of a Web of Linked Data W = hD; adoci captures the relationship
between the URIs that can be looked up in this Web and the documents that can be
retrieved by such lookups. Since not every URI can be looked up, the function is
partial. For any URI u 2 U with u 2 dom(adoc) (i.e., any URI that can be looked up
in W ), document d = adoc(u) can be considered the authoritative source of data for u
in W (hence, the name adoc). To accommodate for documents that are authoritative
for multiple URIs, we do not require injectivity for function adoc. However, we require
surjectivity because we conceive documents as irrelevant for a Web of Linked Data if
they cannot be retrieved by any URI lookup in this Web.</p>
      <p>
        In this paper, we assume that the set of documents D in any Web of Linked Data
W = hD; adoci is finite, in which case we say W is finite [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Moreover, given a
Web of Linked Data W = hD; adoci, we say that a URI u 2 uris(t) in an RDF triple
t = hs; p; oi establishes a data link to a document d 2 D if adoc(u) = d.
      </p>
      <p>As a final concept, our data model formalizes the aforementioned notion of a link
graph in which directed edges represent data links between documents. In the following
formal definition of this graph, each edge is associated with a label that identifies both
the particular RDF triple and the URI in this triple that establishes the corresponding
data link. These labels shall provide the basis for defining the navigational component
of our query language (cf. Section 3.1).</p>
      <p>T</p>
      <p>U
Definition 2. The link graph of a Web of Linked Data W = hD; adoci is a directed,
edge-labeled multigraph hD; Ei whose vertices are all documents in W, and which has
a directed edge hdsrc; t; u; dtgti 2 E from document dsrc 2 D to document dtgt 2 D if
the data of dsrc contains an RDF triple t with a URI u 2 uris(t) that establishes a data
link to dtgt; the edge is labeled with both the triple t and the URI u. Hence, the set of
edges E D D is defined as follows:</p>
      <p>E =</p>
      <p>hdsrc; t; u; dtgti t 2 data(dsrc) such that u 2 uris(t) and adoc(u) = dtgt :
Example 1. As a running example for this paper assume a simple Web of Linked Data
Wex = hDex; adocexi with three documents, dA, dB and dC (i.e., Dex = fdA; dB; dCg).
The data in these documents are the following sets of RDF triples:
data(dA) = fhuA; p1; uBi;
huB; p2; uCig;
data(dB) = fhuB; p1; uCig;
data(dC) = fhuA; p2; uCig;
and function adocex is given as follows: adocex(uA) = dA, adocex(uB) = dB, and
adocex(uC) = dC (i.e., dom(adocex) = fuA; uB; uCg). This Web contains 8 data links.
For instance, URI uA in the RDF triple in data(dC) establishes a data link to document
dA. Hence, the corresponding edge in the link graph of Wex is dC; huA; p2; uCi; uA; dA .
Figure 1 illustrates this link graph with all 8 edges.
3</p>
    </sec>
    <sec id="sec-4">
      <title>Definition of LDQL</title>
      <p>
        This section defines our Linked Data query language, LDQL. LDQL queries are meant
to be evaluated over a Web of Linked Data and each such query is built from two types
of components: First, there are link path expressions for selecting query-relevant
documents of the queried Web of Linked Data; second, there are SPARQL graph patterns
for specifying the query result that has to be constructed from the data in the selected
documents. For this paper, we assume that the reader is familiar with the definition of
SPARQL [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], including the algebraic formalization introduced by Pe´rez et al. [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ].
      </p>
      <p>In the following, we first formalize our notion of link path expressions. Thereafter,
we define a syntax and semantics of LDQL queries and show simple algebraic
properties of such queries. For brevity, we introduce LDQL also based on an algebraic syntax.
Link Path Expressions (LPEs) are a form of nested regular expressions for navigation
over the link graph of a Web of Linked Data. The base case for LPEs is to select single
link graph edges in the context of a designated URI. To this end, we introduce link
patterns, that is, tuples hs; p; oi 2 U [ f ; +g U [ f ; +g U [ L [ f ; +g ,
where is a special symbol that denotes a wildcard and + is another special symbol
that denotes a placeholder for the context URI. Then, a link graph edge hdsrc; t; u; dtgti
with t = hx1; x2; x3i matches a link pattern lp = hy1; y2; y3i in the context of a
URI uctx if for each i 2 f1; 2; 3g, any of the following three properties holds:</p>
      <p>Example 2. Consider the link pattern lpex = h ; p2; i. The link graph of our example
Web Wex (cf. Example 1) contains two edges that match lpex in the context of URI uA,
namely, the edges dA; huB; p2; uCi; uB; dB and dA; huB; p2; uCi; uC; dC .</p>
      <p>The rationale for adopting the notion of a designated context URI in our definition
of link patterns is to support cases in which link graph navigation has to be focused
solely on data links that are authoritative, where we call a data link authoritative if it
is established by an RDF triple in a document dsrc such that dsrc is the authoritative
document for some URI in the triple. Formally, in terms of link graph edges, an edge
hdsrc; t; u; dtgti 2 E in the link graph hD; Ei of a Web of Linked Data W = hD; adoci
represents an authoritative data link if adoc(u0) = dsrc for some URI u0 2 uris(t).
Example 3. Consider the link pattern lp0ex = h ; p2; +i, which is a more restrictive
variation of the link pattern discussed in the previous example. In contrast to lpex, there
does not exist any edge in the link graph of Wex that matches lp0ex in the context of
URI uA. Any such edge would have to represent an authoritative data link; more
precisely, the RDF triple of such a data link must have the context URI (i.e., uA) in the
object position, which is not the case for the two edges that match the less restrictive link
pattern lpex. Notice also that, if we use URI uC as context instead, there exists an edge
that matches link pattern lp0ex in the context of uC, namely dC; huA; p2; uCi; uA; dA .</p>
      <p>Given the notion of a link pattern, we define LPEs as nested regular expressions over
the alphabet that consists of all possible link patterns. Hence, an LPE is an expression
defined by the following grammar (where lp is an arbitrary link pattern):
lpe = " j lp j lpe=lpe j lpejlpe j lpe j [lpe]</p>
      <p>Note that our notion of LPEs does not provide an operator for navigating paths
in their inverse direction. The reason for omitting such an operator is that traversing
arbitrary data links backwards is impossible on the WWW.</p>
      <p>The semantics of LPEs is based on a designated context URI (similar to our notion
of matching edges for link patterns). In particular, the semantics requires that each path
of link graph edges that satisfies an LPE starts from the document that is authoritative
for the given context URI. Then, the result of evaluating an LPE represents the end
nodes of all such paths. More precisely, the result is a set of URIs where, informally,
the lookup of each such URI returns the document that constitutes the end node of the
corresponding path. The following definition captures the semantics of LPEs formally.
Definition 3. Let lpe be an LPE, let W = hD; adoci be a Web of Linked Data with link
graph hD; Ei, and let uctx be a URI. The uctx-based evaluation of lpe over W, denoted
by [[lpe]]uctx, is a set of URIs that is empty if uctx 2= dom(adoc); if uctx 2 dom(adoc),</p>
      <p>W
then the set is defined recursively as follows (where dctx = adoc(uctx), lp is a link
pattern, and lpe, lpe1, lpe2 are arbitrary LPEs):
[[ " ]]uctx = fuctxg</p>
      <p>W
[[lp]]uctx = fu j hdctx; t; u; di 2 E that matches lp in the context of uctxg</p>
      <p>W
[[lpe1=lpe2]]uWctx = fu 2 [[lpe2]]uW0 j u0 2 [[lpe1]]uWctx g
[[lpe1jlpe2]]uWctx = [[lpe1]]uWctx [ [[lpe2]]uctx</p>
      <p>W
[[lpe ]]uctx = fuctxg [ [[lpe]]uctx [ [[lpe=lpe]]uctx [ [[lpe=lpe=lpe]]uctx [ :::</p>
      <p>W W W W
[[ [lpe] ]]uctx = fuctx j [[lpe]]uWctx 6= ;g:</p>
      <p>W
Example 4. Let lpeex be the LPE h ; p2; i (i.e., the link pattern discussed in
Example 2). For our example Web Wex and context URI uA, we know by Example 2 that
the link graph edges dA; huB; p2; uCi; uB; dB and dA; huB; p2; uCi; uC; dC match the
pattern in the context of URI uA. Hence, the LPE selects documents dB = adocex(uB)
and dC = adocex(uC). More precisely, we have [[lpeex]]uWAex = fuB; uCg.
Example 5. As another example consider the slightly more complex LPE lpe0ex which is
of the form h ; p1; i =[h ; p2; i]. This LPE selects documents that can be reached
via arbitrarily long paths of data links with predicate p1 and, additionally, have some
outgoing data link with predicate p2. For our example Web Wex and context URI uA,
all three documents, dA, dB and dC, can be reached via a p1-path from URI uA, but only
dA and dC pass the p2-related test. Hence, we have [[lpe0ex]]uWAex = fuA; uCg.</p>
      <p>
        While LPEs allows users to select documents from the queried Web of Linked Data,
in the context of LDQL these documents are used to form an RDF dataset for evaluating
a given SPARQL graph pattern. Formally, we specify this dataset as follows.
Definition 4. Let uctx be a URI, let lpe be an LPE, and let W = hD; adoci be a Web of
Linked Data. The uctx-lpe-selected dataset over W is an RDF dataset D = hG0; N i (as
per [
        <xref ref-type="bibr" rid="ref1 ref9">9,1</xref>
        ]) whose default graph G0 is the union of all Named Graphs of the dataset, and
that contains a Named Graph hu; Gi 2 N for every URI u 2 [[lpe]]uctx whose lookup (in
W
W ) results in retrieving a document; hence,
      </p>
      <p>G0 = Shu;Gi2N G; and
N = fhu; Gi j u 2 [[lpe]]uctx and u 2 dom(adoc) and G = data(adoc(u))g:</p>
      <p>W
Example 6. Consider context URI uA and the aforementioned example LPE lpe0ex for
which we have [[lpe0ex]]uA</p>
      <p>Wex = fuA; uCg (cf. Example 5). Then, the uA-lpe0ex-selected
dataset over Wex is Dex = hGex; Nexi with Nex = fhuA; data(dA)i; huC; data(dC)ig
and, thus, Gex = fhuA; p1; uBi; huB; p2; uCi; huA; p2; uCig (cf. Example 1).
3.2</p>
      <sec id="sec-4-1">
        <title>LDQL Queries</title>
        <sec id="sec-4-1-1">
          <title>We are now ready to define LDQL queries formally.</title>
        </sec>
        <sec id="sec-4-1-2">
          <title>Definition 5. An LDQL query is defined recursively as follows:</title>
          <p>1. Any tuple q = hu; lpe; P i consisting of a URI u, an LPE lpe, and a SPARQL graph
pattern P is an LDQL query—hereafter, referred to as a basic LDQL query;
2. Any tuple q = h?v; lpe; P i consisting of a variable ?v, an LPE lpe, and a SPARQL
graph pattern P is an LDQL query;
3. If q1 and q2 are LDQL queries, then (q1 AND q2) and (q1 UNION q2) are LDQL queries.</p>
          <p>Before introducing the formal semantics of LDQL queries, we provide some
intuition about it. As per Definition 5, the most basic type of an LDQL query consists
of a URI, an LPE, and a SPARQL graph pattern. Informally, the result of such a
query q = hu; lpe; P i is the set of SPARQL solution mappings that can be obtained
by evaluating the SPARQL pattern P over the u-lpe-selected dataset.</p>
          <p>
            Example 7. Consider a basic LDQL query huA; lpe0ex; Bexi whose LPE is the
aforementioned example LPE lpe0ex (cf. Example 5) and whose SPARQL graph pattern is a
basic graph pattern that contains two triple patterns, Bex = fh?x; p1; ?yi; h?x; p2; ?zig.
According to the semantics outlined above (and defined formally below), the result of
this query over our example Web Wex consists of a single solution mapping, namely
= f?x 7! uA; ?y 7! uB; ?z 7! uCg. To see why we obtain this result, recall
from Example 6 that the default graph of the uA-lpe0ex-selected dataset over Wex is
Gex = fhuA; p1; uBi; huB; p2; uCi; huA; p2; uCig. Then, by the standard (set) semantics
of SPARQL [
            <xref ref-type="bibr" rid="ref1">1</xref>
            ], the result of evaluating basic graph pattern Bex over this graph Gex is
a singleton set that contains solution mapping .
          </p>
          <p>The other types of LDQL queries also have a set of solution mappings as their result:
Operators AND and UNION represent conjunctions and disjunctions, respectively. The
second base type of LDQL queries, tuples with a variable instead of a fixed (context) URI,
is similar to basic LDQL queries with the difference that the variable can range over
alternative context URIs; this type of query is meant to be used in conjunctions in which
the other conjunct is used to select context URIs at query execution time (e.g., see query
qe00x in Example 8 below). The following definition formalizes the query semantics.
Definition 6. The evaluation of an LDQL query q over a Web of Linked Data W,
denoted by [[q]]W , is defined recursively as follows (where u is a URI, ?v is a variable,
lpe is an LPE, P is a SPARQL graph pattern, [[P ]]GD0 denotes the evaluation of P over
an RDF graph G0 in an RDF dataset D = hG0; N i [1, Definition 13.3], and q1 and q2
are arbitrary LDQL queries):
[[hu; lpe; P i]]W = [[P ]]GD0</p>
          <p>( D = hG0; N i is the u-lpe-selected dataset over W );
[[h?v; lpe; P i]]W = Su2U [[hu; lpe; P i]]W on f ug
[[(q1 AND q2)]]W = [[q1]]W on [[q2]]W ;
[[(q1 UNION q2)]]W = [[q1]]W [ [[q2]]W :
( where u = f?v 7! ug );
Example 8. Consider an LDQL query qex = ?x; "; h?x; p1; ?zi , which is of the
second base type in Definition 5 (with the SPARQL graph pattern being a single triple
pattern). Additionally, let qe0x = uA; lpe0ex; fh?x; p1; ?yi; h?x; p2; ?zig be the basic
LDQL query introduced in Example 7, and let qe00x be the conjunction of these two
queries; i.e., qe00x = (qex AND qe0x). By Example 7 we know that [[qe0x]]Wex = f g (with
solution mapping as given in Example 7). Furthermore, based on the data given in
Example 1, it is easy to see that [[qex]]Wex = f 1; 2g with 1 = f?x 7! uA; ?z 7! uBg
and 2 = f?x 7! uB; ?z 7! uCg. For the evaluation of query qe00x over Wex, the result
sets [[qex]]Wex and [[qe0x]]Wex have to be joined. While for 1 2 [[qex]]Wex , there does not
exist a compatible join candidate in [[qe0x]]Wex , solution mapping 2 2 [[qex]]Wex is
compatible with 2 [[qe0x]]Wex . Consequently, we have [[qe00x]]Wex = f 0g with 0 = [ 2.
3.3</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>Algebraic Properties of LDQL Queries</title>
        <p>As a basis for the discussion in the next section, we show simple algebraic properties.
Lemma 1. The operators AND and UNION are associative and commutative,
respectively, and the operator AND distributes over UNION . That is, if q1, q2, and q3 are LDQL
queries, then the following semantic equivalences hold:
– (q1 AND q2)</p>
        <p>Proof. Since the definition of LDQL operators AND and UNION is equivalent to their
SPARQL counterparts, the semantic equivalences in Lemma 1 follow from
corresponding equivalences for SPARQL graph patterns as shown by Pe´rez et al. [15, Lemma 2.5].</p>
        <p>Lemma 1 allows us to write sequences of either AND or UNION without parentheses.
Hereafter, we assume that every UNION-free LDQL query is represented as an LDQL
query of the form (q1 AND q2 AND ::: AND qm) where each subquery qi (1 i m) is
either of the form hu; lpe; P i (i.e., a basic LDQL query) or of the form h?v; lpe; P i.
Moreover, we say that an LDQL query is in UNION normal form if it is of the form
(q1 UNION q2 UNION ::: UNION qn) such that each subquery qi (1 i n) is a UNION-free
LDQL query. The following result is an immediate consequence of Lemma 1.
Corollary 1. For every LDQL query, there exists a semantically equivalent LDQL query
that is in UNION normal form.
4</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Web-Safeness of LDQL Queries</title>
      <p>In this section we study the “Web-safeness” of LDQL queries, where, informally, we
call a query Web-safe if a complete execution of the query over the WWW is possible
in practice (which is not the case for all LDQL queries as we shall see).</p>
      <p>
        To provide a more formal definition of this notion of Web-safeness we make the
following observations. While the mathematical structures introduced by our data model
capture the notion of Linked Data on the WWW formally (and, thus, allow us to provide
a formal semantics for LDQL queries), in practice, these structures are not available
completely for the WWW. For instance, given that an infinite number of strings can be
used as HTTP URIs [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], we cannot assume complete information about which URIs are
in the domain of the partial function adoc (i.e., can be looked up to retrieve some
document) and which are not; in fact, disclosing this information would require a process
that systematically tries to look up every possible HTTP URI and, thus, would never
terminate because of the aforementioned infiniteness. Therefore, it is also impossible to
guarantee the discovery of every document in the set D (without looking up an infinite
number of URIs). Consequently, any query whose execution requires a complete
enumeration of this set is not feasible in practice. Based on these observations, we define
Web-safeness of LDQL queries as follows.
      </p>
      <p>Definition 7. An LDQL query q is Web-safe if there exists an algorithm that, for any
finite Web of Linked Data W = hD; adoci, computes [[q]]W by looking up only a finite
number of URIs without assuming an a priori availability of any information about the
sets D and dom(adoc).</p>
      <p>Example 9. Recall our example queries qex, qe0x, and qe00x (cf. Example 8). For query
qex = ?x; "; h?x; p1; ?zi , any URI u 2 U may be used to obtain a nonempty subset
of the query result as long as a lookup of u retrieves a document whose data includes
RDF triples that match hu; p1; ?zi. Therefore, without access to D or dom(adoc) of the
queried Web W = hD; adoci, the completeness of the computed query result can be
guaranteed only by checking each of the infinitely many possible HTTP URIs. Hence,
query qex is not Web-safe. In contrast, although it contains qex as a subquery, query
qe00x = (qex AND qe0x) is Web-safe, and so is qe0x = huA; lpe0ex; Bexi. A possible execution
algorithm for qe0x may first compute [[lpe0ex]]uWA by traversing the queried Web W based
on the given LPE lpe0ex. Thereafter, the algorithm retrieves documents by looking up all
URIs u 2 [[lpe0ex]]uA (or simply keeps these documents after the traversal); and, finally,</p>
      <p>W
the algorithm evaluates pattern Bex over the union of the RDF triples in the retrieved
documents. If W is finite (i.e., contains a finite number of documents) the traversal
process requires a finite number of URI lookups only, and so does the retrieval of
documents in the second step; the final step does not look up any URI. To see that qe00x is
also Web-safe we note that after executing subquery qe0x (e.g., by using the algorithm as
outlined before), the execution of the other (non-Web-safe) subquery qex can be reduced
to a finite number of URI lookups, namely the URIs bound to variable ?x in solution
mappings obtained for subquery qe0x. Although any other URI may also be used to
obtain solution mappings for qex, such solution mappings cannot be joined with any of the
solution mappings for qe0x and, thus, are irrelevant for the result of qe00x.</p>
      <p>The example illustrates that there exists an LDQL query that is not Web-safe. In
fact, it is not difficult to see that the argument for the non-Web-safeness of query qex as
made in the example can be applied to any LDQL query of the form h?v; lpe; P i; that is,
none of these queries is Web-safe. However, the example also shows that more complex
queries that contain such non-Web-safe subqueries may still be Web-safe. Therefore, we
now introduce properties to identify LDQL queries that are Web-safe (even if some of
their subqueries are not). As a basis for the discussion, we first observe that the
Websafeness of an LDQL query carries over to any semantically equivalent LDQL query.
Fact 1. An LDQL query q is Web-safe if there exists an LDQL query q0 such that q and
q0 are semantically equivalent and q0 is Web-safe.</p>
      <p>In conjunction with Corollary 1, Fact 1 allows us to focus our discussion on LDQL
queries in UNION normal form without losing generality. It is easy to show that such
queries are Web-safe if all of their (top-level) subqueries are Web-safe:
Proposition 1. An LDQL query of the form (q1 UNION q2 UNION ::: UNION qn) is
Websafe if each subquery qi (1 i n) is Web-safe.</p>
      <p>Proof. Assume each subquery qi is Web-safe. Hence, for each qi, there exists an
algorithm that satisfies the conditions in Definition 7. Then, the Web-safeness of LDQL
query (q1 UNION q2 UNION ::: UNION qn) is easily shown by specifying another algorithm
that calls the algorithms of the subqueries sequentially and unions their results.</p>
      <p>By Proposition 1, we can show that an LDQL query in UNION normal form is
Websafe by showing that each of its UNION-free subqueries is Web-safe. Therefore, in the
remainder of this section we focus the Web-safeness of UNION-free LDQL queries.</p>
      <p>Our discussion of the (UNION-free) query qe00x = (qex AND qe0x) in Example 9 suggests
that UNION -free LDQL queries can be shown to be Web-safe if it is possible to execute
any non-Web-safe subquery by using variable bindings obtained from other subqueries.
A necessary condition for such an execution strategy is that the variable in question (i.e.,
variable ?x in the case of non-Web-safe subquery qex = ?x; "; h?x; p1; ?zi ) is
guaranteed to be bound in every possible solution mapping obtained from the other subqueries.</p>
      <p>
        To allow for an automated verification of this condition we adopt Buil-Aranda et
al.’s notion of strongly bound variables [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].1 To this end, for any SPARQL graph
pattern P , let sbvars(P ) denote the set of strongly bound variables in P as defined by
Buil-Aranda et al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. For the sake of space, we do not repeat the definition here.
However, we emphasize that sbvars(P ) can be constructed recursively, and each variable in
sbvars(P ) is guaranteed to be bound in every possible solution for P [5, Proposition 1].
To carry over these properties to LDQL queries, we use the notion of strongly bound
variables in SPARQL patterns to define the following notion of strongly bound variables
in LDQL queries; thereafter, in Lemma 2, we show the desired boundedness guarantee.
      </p>
      <sec id="sec-5-1">
        <title>Definition 8. The set of strongly bound variables in a LDQL query q, denoted by</title>
        <p>sbvars(q), is defined recursively as follows:
1. If q is of the form hu; lpe; P i, then sbvars(q) = sbvars(P ).
2. If q is of the form h?v; lpe; P i, then sbvars(q) = sbvars(P ) [ f?vg.
3. If q is of the form (q1 AND q2), then sbvars(q) = sbvars(q1) [ sbvars(q2).
4. If q is of the form (q1 UNION q2), then sbvars(q) = sbvars(q1) \ sbvars(q2).
Lemma 2. Let q be an LDQL query. For every Web of Linked Data W and every
solution mapping 2 [[q]]W , it holds that sbvars(q) dom( ).</p>
        <p>Proof. Lemma 2 follows trivially from Definition 8 and [5, Proposition 1].</p>
        <sec id="sec-5-1-1">
          <title>We are now ready to show the following result.</title>
          <p>Theorem 1. A UNION-free LDQL query (q1 AND q2 AND ::: AND qm) is Web-safe if there
exists a total order over the set of subqueries fq1; q2; ::: ; qmg such that for each
subquery qi (1 i m), it holds that either (i) qi is a basic LDQL query or (ii) qi is
of the form h?v; lpe; P i and ?v 2 Sqj qi sbvars(qj ).</p>
          <p>
            Proof (Sketch). We prove Theorem 1 based on an iterative algorithm that generalizes
the execution strategy outlined for query qe00x in Example 9. That is, the algorithm
executes the subqueries q1; q2; ::: ; qm sequentially in the order such that each iteration
step executes one of the subqueries by using the solution mappings computed during
the previous step. By the conditions in Theorem 1, the first subquery (according to )
1 While we may also adopt Buil-Aranda et al.’s notion of bound variables (not to be confused
with their notion of strongly bound variables), a definition of bound variables in LDQL queries
would be based directly on the boundedness of variables in SPARQL patterns. Then, it is not
difficult to see that the undecidability of verifying whether a given variable is bound in a given
SPARQL pattern [
            <xref ref-type="bibr" rid="ref5">5</xref>
            ] would also carry over to LDQL queries. Due to space limitations, we omit
discussing boundedness and use directly the decidable alternative (i.e., strong boundedness).
cannot be of the form h?v; lpe; P i; hence, the first step of the iteration is guaranteed to
execute a basic LDQL query. This execution resembles the execution strategy as
outlined for the (basic) query qe0x in Example 9. For any subsequent iteration step, if the
subquery executed by that step is of the form h?v; lpe; P i, then the corresponding
condition in Theorem 1 (in conjunction with Lemma 2) guarantees that ?v 2 dom( ) for
every solution mapping obtained from the previous step. These mappings are finitely
many (for a finite Web of Linked Data). Then, the algorithm uses the URIs bound to
variable ?v in these mappings as the only possible context URIs for the execution of
the subquery (instead of using all URIs), which is sufficient because solution mappings
computed for the subquery by using some other context URI would not be join
compatible with any of the solution mappings obtained from the previous step. For a full formal
proof and the complete algorithm we refer to the extended version of this paper [
            <xref ref-type="bibr" rid="ref13">13</xref>
            ].
          </p>
          <p>To recapitulate, we summarize our proposed procedure to test a given LDQL query q
for Web-safeness based on our results in this paper: First, by using the algebraic
properties in Lemma 1, the query has to be rewritten into a semantically equivalent LDQL
query qnf = (q1 UNION q2 UNION ::: UNION qn) that is in UNION normal form, which is
possible for any LDQL query (cf. Corollary 1). Thereafter, the following Web-safeness test
has to repeated for every subquery qi (1 i n); recall that each of these subqueries
is a UNION-free LDQL query qi = (q1i AND q2i AND ::: AND qmii ). The test is to find an
order for their subqueries q1i; q2i; ::: ; qmii that satisfies the conditions in Theorem 1. Every
top-level subquery qi (1 i n) for which such an order exists, is Web-safe (cf.
Theorem 1). If all top-level subqueries are identified to be Web-safe by this test, then qnf is
Web-safe (cf. Proposition 1), and so is q (cf. Fact 1).</p>
          <p>We conclude the section by pointing out the following limitation of our results: Even
if the given conditions are sufficient to show that an LDQL query is Web-safe, they are
not sufficient for showing the opposite. It remains an open question whether there exists
a (decidable) property of all Web-safe LDQL queries that is sufficient and necessary.
5</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Concluding Remarks and Future Work</title>
      <p>
        LDQL, the query language that we introduce in this paper, allows users to express
queries over Linked Data on the WWW. We defined LDQL such that navigational
features for selecting the query-relevant documents on the Web are separate from patterns
that are meant to be evaluated over the data in the selected documents. This separation
distinguishes LDQL from other approaches to express queries over Linked Data. For
instance, neither any of the existing SPARQL-based approaches [
        <xref ref-type="bibr" rid="ref10 ref11 ref14 ref16 ref4">4,10,11,14,16</xref>
        ] nor
NautiLOD [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] can be used to express queries such as our example queries qe0x and qe00x.
      </p>
      <p>Given that our analysis of LDQL in this paper focuses primarily on Web-safeness,
in future work the language may be studied with respect to the complexity of query
evaluation and the problem of query containment. A formal study of the expressive
power of LDQL would also be interesting. A more practical direction for future research
on LDQL is the development of approaches to execute LDQL queries efficiently.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>M.</given-names>
            <surname>Arenas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Gutierrez</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Pe</surname>
          </string-name>
          <article-title>´rez. On the Semantics of SPARQL</article-title>
          . In Semantic Web Information Management - A
          <string-name>
            <surname>Model-Based</surname>
            <given-names>Perspective</given-names>
          </string-name>
          , chapter
          <volume>13</volume>
          . Springer,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>T.</surname>
          </string-name>
          Berners-Lee.
          <article-title>Linked Data</article-title>
          . Online at http://www.w3.org/DesignIssues/ LinkedData.html,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>C.</given-names>
            <surname>Bizer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Heath</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Berners-Lee</surname>
          </string-name>
          .
          <article-title>Linked Data - The Story So Far</article-title>
          .
          <source>Semantic Web and Information Systems</source>
          ,
          <volume>5</volume>
          (
          <issue>3</issue>
          ):
          <fpage>1</fpage>
          -
          <lpage>22</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>P.</given-names>
            <surname>Bouquet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Ghidini</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Serafini</surname>
          </string-name>
          .
          <article-title>Querying The Web Of Data: A Formal Approach</article-title>
          .
          <source>In Proceedings of the 4th Asian Semantic Web Conference</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>C.</given-names>
            <surname>Buil-Aranda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Arenas</surname>
          </string-name>
          , and
          <string-name>
            <given-names>O.</given-names>
            <surname>Corcho</surname>
          </string-name>
          .
          <article-title>Semantics and Optimization of the SPARQL 1.1 Federation Extension</article-title>
          .
          <source>In Proceedings of the 8th Extended Semantic Web Conference</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>R.</given-names>
            <surname>Cyganiak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Wood</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lanthaler</surname>
          </string-name>
          , G. Klyne,
          <string-name>
            <given-names>J. J.</given-names>
            <surname>Carroll</surname>
          </string-name>
          , and B.
          <source>McBride. RDF 1.1 Concepts</source>
          and
          <string-name>
            <given-names>Abstract</given-names>
            <surname>Syntax</surname>
          </string-name>
          .
          <source>W3C Recommendation</source>
          , Online at http://www.w3.org/ TR/rdf11-concepts/, Feb.
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>R.</given-names>
            <surname>Fielding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gettys</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. C.</given-names>
            <surname>Mogul</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Frystyk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Masinter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. J.</given-names>
            <surname>Leach</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Berners-Lee</surname>
          </string-name>
          .
          <source>Hypertext Transfer Protocol - HTTP/1.1. RFC 2616</source>
          , Online at http://tools.ietf. org/html/rfc2616,
          <year>June 1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>V.</given-names>
            <surname>Fionda</surname>
          </string-name>
          , G. Pirro`, and
          <string-name>
            <given-names>C.</given-names>
            <surname>Gutierrez</surname>
          </string-name>
          .
          <article-title>NautiLOD: A Formal Language for the Web of Data Graph</article-title>
          .
          <source>ACM Transactions on the Web</source>
          ,
          <volume>9</volume>
          (
          <issue>1</issue>
          ):
          <fpage>1</fpage>
          -
          <lpage>43</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>S.</given-names>
            <surname>Harris</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Seaborne</surname>
          </string-name>
          , and
          <string-name>
            <given-names>E.</given-names>
            <surname>Prud</surname>
          </string-name>
          <article-title>'hommeaux. SPARQL 1.1 Query Language</article-title>
          .
          <source>W3C Recommendation</source>
          , Online at http://www.w3.org/TR/sparql11-query/, Mar.
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>A.</given-names>
            <surname>Harth</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Speiser</surname>
          </string-name>
          .
          <article-title>On Completeness Classes for Query Evaluation on Linked Data</article-title>
          .
          <source>In Proceedings of the 26th AAAI Conference</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>O.</given-names>
            <surname>Hartig</surname>
          </string-name>
          .
          <article-title>SPARQL for a Web of Linked Data: Semantics and Computability</article-title>
          .
          <source>In Proceedings of the 9th Extended Semantic Web Conference</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>O.</given-names>
            <surname>Hartig</surname>
          </string-name>
          .
          <article-title>An Overview on Execution Strategies for Linked Data Queries</article-title>
          .
          <source>DatenbankSpektrum</source>
          ,
          <volume>13</volume>
          (
          <issue>2</issue>
          ),
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>O. Hartig.</surname>
          </string-name>
          <article-title>LDQL: A Language for Linked Data Queries (Extended Version)</article-title>
          . Online at http://olafhartig.de/files/LDQL-ext.pdf,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>O.</given-names>
            <surname>Hartig</surname>
          </string-name>
          and
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Pirr o`. A Context-Based Semantics for SPARQL Property Paths over the Web</article-title>
          .
          <source>In Proceedings of the 12th Extended Semantic Web Conference</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15. J. Pe´rez, M. Arenas, and
          <string-name>
            <given-names>C.</given-names>
            <surname>Gutierrez</surname>
          </string-name>
          .
          <source>Semantics and Complexity of SPARQL. ACM Transactions on Database Systems</source>
          ,
          <volume>34</volume>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>J. Umbrich</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Hogan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Polleres</surname>
            , and
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Decker</surname>
          </string-name>
          .
          <article-title>Link Traversal Querying for a Diverse Web of Data</article-title>
          .
          <source>Semantic Web Journal</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>