<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>SQL Nested Queries in SPARQL</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Renzo Angles</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Claudio Gutierrez</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, Universidad de Chile</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Computer Science, Universidad de Talca</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>SPARQL currently does not include any form of nested queries. In this paper we present a proposal to incorporate nested queries into SPARQL along the design philosophy of SQL nested queries. We present rewriting algorithms and show that all the proposed nested queries can be expressed in a natural and simple extension of SPARQL syntax.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>One of the most powerful features of a query language is the nesting of queries,
that is, the possibility of writing in a single expression a query which uses the
output of other queries. The current W3C recommendation of SPARQL [13] does
not include any form of nesting, although it has been considered as an issue by
the RDF Data Access Working Group, and has been gradually incorporated into
SPARQL engines.</p>
      <p>For SPARQL, the incorporation of nested queries has several motivations.
One of the most important is the reuse of queries. Once a query is executed, the
user may presumably direct the output to some storage medium, assign an IRI
to it and then run a query against that extract. Also, with the right interface,
the user might be able to just cut-and-paste a query that was debugged into
ad-hoc queries. Hence, this feature would allow to build queries incrementally
from separately debugged pieces. Another important motivation, as SPARQL
was thought of to work on a distributed environment like the Web, is the notion
of distributed queries. A SPARQL user will have access to vast and time-varying
input RDF graphs, containing huge volumes of data that is not of interest to the
user. Hence the query computation can be distributed and only relevant data
used to compute the nal query. For example, AllegroGraph supports queries
with distributed databases. A third and important motivation is query rewriting.
Complex queries can be structured in a way more intuitive and understandable
for the user. Examples of these uses are provided in the paper.</p>
      <p>Introducing nesting in a coherent form, that is, providing a clean syntax and
semantics, is a delicate task. There are several models and \philosophies", the
most widely known those of SQL and XQuery. Based on the similarities between
SQL and SPARQL [1,4], it seems natural to investigate how the SQL nesting
model can be introduced into SPARQL. In this paper, we address this challenge
by investigating systematically the behavior of SQL nesting operators in the
context of SPARQL.</p>
      <p>There have been several proposals for introducing nesting in SPARQL.
Indeed, it has been considered as an issue by the RDF Data Access Working Group.
It was raised on July 2004 and was denominated cascadedQueries1. Currently,
the W3C Working Group of SPARQL is working on new features for the
language [9]. Among them, the notion of Subqueries (to nest a query within another
query) is a required feature. As a possible type of subquery, the working draft of
SPARQL 1.1 [7] introduces the notion of subselect, that is to allow a SELECT
query to be a graph pattern.</p>
      <p>Regarding real-life practice, some implementations of SPARQL provide
extensions that include support for some types of nested queries. ARQ, the query
engine for Jena, supports a type of nested SELECT which uses aggregate
functions2. Virtuoso has also included some extensions3 related to nested queries.
Among them, an embedded select query in the place of a triple pattern, and
lter conditions of the form \exists (&lt;scalar subquery&gt;)". None of these
proposals and/or implementations present systematic covering and analysis of these
extensions, nor a formal semantics for them. It is important to note that all of
them introduce nested queries as a new pattern of the form (SELECT ....).
This approach introduces several design decisions that are either non-desirable
or not necessary at this level. Minor problems are the creation of values at
the pattern level by allowing patterns like (SELECT ?X ?Y AS 5 ...), and
the introduction of projection in patterns by allowing patterns of the form
(SELECT ?X ?Y WHERE P(?X)) where ?Y does not appear in the pattern P.
A more relevant problem is introduced once correlation of variables with
subqueries is accepted, a desirable and necessary functionality when dealing with
nesting. Either the original SPARQL unorder evaluation strategy of patterns or
the standard semantics of correlation would need to be reworked. For example,
consider the graph pattern P(?X) AND (SELECT ?Y WHERE R(?X,?Y)), where
?X is a correlated variable. The standard semantics would evaluate rst P and
then for each value of ?X the corresponding instance of the SELECT pattern. In the
proposals presented it is not clear how to deal with this issue. Additionally, we
do not consider the orthogonal functionality of composition which in SPARQL
would correspond to the discussion of how to nest queries in the FROM and
FROM NAMED clauses. Schenk [14] proposes the use of views as parts of a
dataset, that is, the inclusion of CONSTRUCT queries in FROM clauses. We
do not address this topic in this paper.</p>
      <p>An alternative approach to the introduction of SELECT as a new type of
pattern, is to incorporate nesting as a ltering device. An early attempt in this
direction is Polleres [12], where it is suggested that boolean SPARQL queries
(i.e., queries having ASK query form) can be safely allowed within lter
constraints, but the extension is not developed. In this paper we develop this design
philosophy to its end based on the design philosophy of SQL nested queries,
1 http://www.w3.org/2001/sw/DataAccess/issues#cascadedQueries
2 http://jena.sourceforge.net/ARQ/sub-select.html
3 http://www.w3.org/2009/sparql/wiki/Extensions_Proposed_By_OpenLink#
Nested_Queries
which restrict the nested queries to a form of ltering in the WHERE condition.
This approach allows to have a clean semantics for correlated variables, permits
to modularly extend the language, and naturally extend the original SPARQL
semantics. We develop this feature through the inclusion of SQL-like nested
queries in usual SPARQL lter constraints. In particular, we present the
following contributions. First, we present a proposal to incorporate nested queries to
SPARQL along the design philosophy of SQL, by presenting the syntax and a
formal semantics completely compatibles to the current one, and discuss design
features. Second, we show, via illustrative examples, that all SQL facilities are
also relevant in SPARQL, and present a set of equivalences and rewriting rules
among them. Third, we prove that all classical SQL nesting operators (i.e., IN,
SOME/ANY, ALL and EXISTS) can be reduced into one of them (i.e.,
EXISTS), hence proving that all standard nesting constructs can be expressed with
the standard lter part of a SPARQL query.</p>
      <p>The paper is organized as follows: Section 2 presents the syntax and the
semantics of the extension of SPARQL with nested queries. Section 3 presents
examples of nested queries. Section 4 presents the algorithms to rewrite among
nested queries. Finally, Section 5 presents some conclusions.
1.1</p>
      <sec id="sec-1-1">
        <title>Nested queries in SQL</title>
        <p>SQL is a paradigmatic example of the power of nested queries. In fact, this
feature plays and important role in SQL for several reasons [16]. SQL (as de ned
in the ANSI/ISO SQL-92) allows nesting of query blocks in FROM and WHERE
clauses, in any level of nesting. In the FROM clause, a subquery is imported and
used to conform the set of relations to be queried. A subquery in the WHERE
clause can be either aggregate (it returns a single value due to an aggregate
operator) or non-aggregate (it returns either a set of values or empty, i.e., a
SELECT query).</p>
        <p>
          Let QA be an aggregate query, QS and Q be non-aggregate queries where QS
returns one-column relation (i.e., it has a single projection predicate), and be
a scalar comparison operator (&lt;; ; &gt;; =; 6=). A selection predicate containing
nested queries can be of the form:
(
          <xref ref-type="bibr" rid="ref1">1</xref>
          ) hvalue j attributei (QA) (value-set comparison predicate).
(
          <xref ref-type="bibr" rid="ref2">2</xref>
          ) hvalue j attributei IN j NOT-IN (QS ) (set-membership predicate).
(
          <xref ref-type="bibr" rid="ref3">3</xref>
          ) hvalue j attributei SOME j ALL(QS ) (quanti ed predicate).
(
          <xref ref-type="bibr" rid="ref4">4</xref>
          ) EXISTS j NOT-EXISTS(Q ) (existential predicate).
        </p>
        <p>For example, consider the relations EMPLOYEES(EMP#,NAME,SAL,DEPT) and
DEPARTMENTS(DEPT#,NAME,LOCATION). The following expression shows a
SQL nested query Q1 with aggregate (Q2) and non-aggregate (Q3) subqueries.</p>
        <p>SELECT E.NAME FROM EMPLOYEES E
WHERE E.SAL &gt;
(SELECT AVG(F.SAL) FROM EMPLOYEES F
WHERE F.DEPT IN
(SELECT D.DEPT# FROM DEPARTMENTS D
WHERE D.LOCATION = 'DENVER') );
(Q1)
(Q2)
(Q3)</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Syntax and Semantics of nested queries</title>
      <p>In this section we extend the syntax and semantics of SPARQL to support nested
queries. This extension, is based on the ideas and syntax of SQL nested queries
as presented above. Considering that aggregate operators for SPARQL have not
been de ned yet, we are only considering non-aggregate queries (i.e., SELECT
and ASK queries) as nested queries. Hence, we have not included the value-set
comparison predicates of SQL as de ned before. The de nition of this extension
follows the formalization presented in [11].
2.1</p>
      <sec id="sec-2-1">
        <title>Preliminaries: The RDF Model and RDF Datasets</title>
        <p>Assume there are pairwise disjoint in nite sets I, B, L (IRIs, blank nodes, and
RDF literals respectively). We denote by T the union I [ B [ L (RDF terms).
A tuple (v1; v2; v3) 2 (I [ B) I T is called an RDF triple, where v1 is the
subject, v2 the predicate, and v3 the object. An RDF Graph [10] (just graph from
now on) is a set of RDF triples. Given a graph G, we denote by term(G) the set
of elements of T appearing in G and by blank(G) the set of blank nodes in G.
If G is referred to by an IRI u, then graph(u) returns the graph available in u,
i.e, G = graph(u).</p>
        <p>We de ne two operations on two graphs G1 and G2. The union of graphs,
denoted G1 [ G2, is the set theoretical union of their sets of triples. The merge of
graphs, denoted G1 + G2, is the graph G1 [ G02 where G02 is the graph obtained
from G2 by renaming its blank nodes to avoid clashes with those in G1.</p>
        <p>An RDF dataset is a set D = fG0; hu1; G1i; : : : ; hun; Gnig where each Gi is a
graph and each uj is an IRI. G0 is called the default graph and each pair hui; Gii
is called a named graph. Every dataset satis es that: (i) it always contains one
default graph, (ii) there may be no named graphs, (iii) each uj is distinct, and
(iv) blank(Gi) \ blank(Gj ) = ; for i 6= j. Given D, we denote by term(D) the
set of terms occurring in the graphs of D. The default graph of D is denoted
dg(D). For a named graph hui; Gii de ne name(Gi)D = ui and graph(ui)D = Gi;
otherwise name(Gi)D = ; and graph(ui)D = ;. We denote by names(D) the set
of IRIs fu1; : : : ; ung. Although name(G0) = ;, we sometimes will use g0 when
referring to G0. Finally, the active graph of D is the graph Gi used for querying
the dataset.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Syntax of nested queries</title>
        <p>Assume the existence of an in nite set V of variables disjoint from T . Let var( )
the function which returns the set of variables occurring in the structure .</p>
        <p>
          A triple pattern is a tuple in (T [ V ) (I [ V ) (T [ V ). A nested query is
a tuple (R; F; P )4 where R is a result query form, F is a set {possibly empty{
of dataset clauses, and P is a graph pattern. Next we de ne each component.
4 In this paper we do not consider the solution modi ers de ned in [13].
(
          <xref ref-type="bibr" rid="ref1">1</xref>
          ) If W V is a set of variables and H is a set of triple patterns (called a graph
template) then the expressions SELECT W , CONSTRUCT H, and ASK are
result query forms.
(
          <xref ref-type="bibr" rid="ref2">2</xref>
          ) If u 2 I and QC is a query of the form (CONSTRUCT H; F; P ), then the
expressions FROM u and FROM NAMED u are dataset clauses.
(
          <xref ref-type="bibr" rid="ref3">3</xref>
          ) A lter constraint is de ned recursively as follows:
{ If ?X; ?Y 2 V and v 2 I [ L then ?X = v, ?X = ?Y , and bound(?X)
are (atomic) lter constraints5.
{ If u 2 T , is a scalar comparison operator (=; 6=; &lt;; &lt;=; &gt;; &gt;=), and
Q?X is a query of the form (SELECT ?X; F; P ), then the expressions
(u SOME(Q?X )), (u ALL(Q?X )) and (u IN (Q?X )) are lter
constraints.
{ If QA is a query of the form (ASK; F; P ), then the expression EXISTS(QA)
is a lter constraint.
{ If C1 and C2 are lter constraints, then (:C1), (C1 ^ C2), and (C1 _ C2)
are (complex ) lter constraints.
(
          <xref ref-type="bibr" rid="ref4">4</xref>
          ) A graph pattern is de ned recursively as follows:
{ A triple pattern is a graph pattern.
{ If P1 and P2 are graph patterns then the expressions (P1 AND P2),
(P1 OPT P2), (P1 UNION P2), and (P1 MINUS P2) are graph patterns.6
{ If P is a graph pattern and u 2 I [ V then the expression (u GRAPH P )
is a graph pattern.
{ If P is a graph pattern and C is a lter constraint then the expression
(P FILTER C) is a graph pattern.
        </p>
        <p>Let Q = (R; F; P ) be a query. A query Q0 is nested in Q if and only if Q0
occurs in the graph pattern P , i.e., when Q0 is nested in P . In such case, Q is
known as the outer query and Q0 is known as the inner query. If Q does not
contain nested queries then Q is called a at query.</p>
        <p>Note that, nested queries in SPARQL have been de ned by extending the
de nition of lter constraints with SQL-like predicates for nesting, speci cally
by including the IN, SOME, ALL and EXISTS operators. The corresponding
opposite operators of nesting can be represented by using the negation of lter
constraints, i.e., NOT-IN and NOT-EXISTS are expressed as (:(u IN(Q?X )))
and (: EXISTS(QA)) respectively.</p>
        <p>We have de ned two explicit restrictions about inner queries. On the one
hand, lter expression using IN, ALL and SOME are restricted to use SELECT
queries with a single projection-variable. On the other hand, lter expressions
using EXISTS are restricted to use ASK queries. The latter condition has been
included for simplicity. In practice, EXISTS lters could have queries having any
result query form (e.g. SELECT), because the EXISTS condition does not really
use the results of the inner query at all.
5 For a complete list of atomic lter constraints see the SPARQL speci cation [13]
6 The MINUS operator is not de ned in the SPARQL speci cation, however it can be
simulated by a combination of the OPT and FILTER operators [1].</p>
        <p>Similar to SQL, the extension presents two features inherent to query nesting.
First, the language allows queries with any level of nesting. However, it is
wellknown that queries with more than two levels of nesting are not recommended
in practice because makes the query more di cult to read, understand, maintain
and increases the execution time [16]. Second, variables from an outer query block
can be accessed inside a nested query block. Such variables, called correlated
variables, perform as outer references from the inner query to the outer query.
A subquery containing correlated variables is called a correlated subquery.
2.3</p>
      </sec>
      <sec id="sec-2-3">
        <title>Semantics of nested queries</title>
        <p>A mapping is a partial function : V ! T . The domain of , dom( ),
is the subset of V where is de ned. The empty mapping 0 is a mapping
such that dom( 0) = ;. Given a triple pattern t and a mapping such that
var(t) dom( ), (t) is the triple obtained by replacing the variables in t
according to . Abusing notation, for a query Q, we denote by (Q) the query
resulting from replacing variables in Q according to .</p>
        <p>
          Two mappings 1 and 2 are compatible when for all ?X 2 dom(
          <xref ref-type="bibr" rid="ref1">1</xref>
          )\dom(
          <xref ref-type="bibr" rid="ref2">2</xref>
          )
it satis es that 1(?X) = 2(?X), i.e., when 1 [ 2 is also a mapping. The
operations of join, union, di erence and left outer-join between two sets of mappings
1 and 2 are de ned as follows:
{
{
{
{
1 on 2 = f 1 [ 2 j 1 2 1; 2 2
1 [ 2 = f j 2 1 or 2 2g
1 n 2 = f 1 2 1 j for all 2 2
1qyon 2 = ( 1 on 2) [ ( 1 n 2)
        </p>
        <sec id="sec-2-3-1">
          <title>2; 1 and 2 are compatibleg</title>
        </sec>
        <sec id="sec-2-3-2">
          <title>2; 1 and 2 are not compatibleg</title>
          <p>
            The answer for a query Q = (R; F; P ), denoted ans(Q), is a function which
returns: (i) a set of mappings when R is a SELECT query; (ii) an RDF graph
when R is a CONSTRUCT query; and (iii) a boolean value (true = false) when
R is an ASK query. We will use this informal de nition of ans( ) to de ne the
semantics for the components of a query.
(
            <xref ref-type="bibr" rid="ref1">1</xref>
            ) Semantics of result query forms. Let be a mapping and R be a result query
form. The result of R given , denoted result(R; ), is de ned as follows:
{ If R is SELECT W then result(R; ) is the restriction of to W , that
is the mapping denoted jW such that dom( jW ) = dom( ) \ W and
jW (?X) = (?X) for every ?X 2 dom( jW ).
{ If R is CONSTRUCT H then result(R; ) is the set of RDF triples (i.e.
          </p>
          <p>an RDF graph) f (t) j t 2 H and (t) (I [ B) I T g.</p>
          <p>
            { If R is ASK then result(R; ) is false if = ; and true otherwise.
(
            <xref ref-type="bibr" rid="ref2">2</xref>
            ) Semantics of dataset clauses. Let F be a set of dataset clauses. The dataset
resulting from F , denoted dataset(F ), contains:
(i) a default graph consisting of the merge of the graphs referred in clauses
FROM u. If there is no FROM u, then the default graph is an empty
graph G0 = ;; and
(ii) a named graph hu; graph(u)i for each dataset clause \FROM NAMED u".
(
            <xref ref-type="bibr" rid="ref3">3</xref>
            ) Semantics of lter constraints. Let be a mapping and C be a lter
constraint. We say that satis es C, denoted j= C, if:
{ C is ?X = v, ?X 2 dom( ), and (?X) = v;
{ C is ?X = ?Y , ?X 2 dom( ), ?Y 2 dom( ), and (?X) = (?Y );
{ C is bound(?X) and ?X 2 dom( );
{ C is (:C1) and it is not the case that j= C1;
{ C is (C1 _ C2) and j= C1 or j= C2;
{ C is (C1 ^ C2), j= C1 and j= C2.
{ C is (u SOME(Q?X )) and there exists a mapping 0 2 ans( (Q?X ))
satisfying that either u 0(?X) when u 2 I [ L or (u) 0(?X) when
u 2 V .
{ C is (u ALL(Q?X )) and for every mapping 0 2 ans( (Q?X )) it holds
that either u 0(?X) when u 2 I [ L or (u) 0(?X) when u 2 V .
{ C is (u IN (Q?X )) and there exists a mapping 0 2 ans( (Q?X ))
satisfying that either u 0(?X) when u 2 I [ L or (u) 0(?X) when
u 2 V .
          </p>
          <p>
            { C is EXISTS(QA) and ans( (QA)) is true.
(
            <xref ref-type="bibr" rid="ref4">4</xref>
            ) Semantics of graph patterns. The evaluation of a graph pattern P over a
dataset D with active graph G, denoted J KGD, is de ned recursively as follows:
{ P is a triple pattern then JP KGD = f j dom( ) = var(P ) and (P ) Gg
{ J(P1 AND P2)KGD = JP1KGD on JP2KGD.
{ J(P1 OPT P2)KGD = JP1KGDqyon JP2KGD.
{ J(P1 UNION P2)KGD = JP1KGD [ JP2KGD.
{ J(P1 MINUS P2)KGD = JP1KGD n JP2KGD.
{ If u 2 I then J(u GRAPH P1)KGD = JP1KgDraph(u)D .
{ If ?X 2 V and ?X!v is a mapping such that dom( ) = f?Xg and
(?X) = v, then
          </p>
          <p>J(?X GRAPH P1)KGD = Sv 2 names(D)(JP1KgDraph(v)D on f ?X!vg).
{ J(P1 FILTER C)KGD = f j 2 JP1KGD and j= Cg
{ If P is a SELECT query QS then JP KGD = ans(QS).</p>
          <p>De nition 1 (Answer for a query). Let Q = (R; F; P ) be a query, D be the
dataset obtained from F , and G be the default graph of D. The answer to Q,
denoted ans(Q), is de ned as follows:
{ if R is SELECT W then ans(Q) = fresult(R; ) j 2 JP KGDg.
{ if R is CONSTRUCT H and blank(H) is the set of blank nodes appearing
in H, then ans(Q) = f i(result(R; i)) j i 2 JP KGDg where i : blank(H) !
(B n blank(H) is a blank renaming function satisfying that for each pair of
mappings j; k 2 JP KGD, range( j) \ range( k) = ;.
{ if R is ASK then ans(Q) = false when JP KGD = ; (i.e., there exists no
mapping 2 JP KGD) and ans(Q) = true otherwise.</p>
          <p>The semantics for correlated queries as de ned above follows the nested
iteration method [8], i.e., the inner query is performed once for each solution of the
outer query (it is because the results of the inner query are correlated with each
individual solution of the outer query). This procedure is attained by replacing
variables in the inner query with the corresponding values given by the current
mapping of the outer query (e.g., by applying (Q?X )). For example, consider
the graph pattern
(((?X name ?N ) OPT(?X knows ?Y )) FILTER EXISTS(ASK(?Y email ?E))):
The method establishes that the graph pattern (?Y email ?E)) (i.e., the
subquery) is evaluated over and over again, once for each result mapping of the
OPTIONAL graph pattern (i.e., the outer query).</p>
          <p>We have identi ed two issued related to the use of correlated variables. First,
loss of correlation due to unbounded variables. Consider that P is the the
OPTIONAL graph pattern in the above example. If is a mapping in JP K such
that (?X) = a, (?N ) = b and (?Y ) is unbounded (i.e., there was no solution
for the OPTIONAL part), then there is no value to replace the variable ?Y in
the inner query, and consequently there exists no correlation. This loss of
correlation results in an undesirable evaluation because, when the inner query has
at least one solution, the lter condition is true and the mapping is accepted
as a solution. Clearly, it is not what the query intuitively means such that the
evaluation of the inner graph pattern depends directly on the evaluation of the
outer graph pattern. This problem, produced by correlated variables that could
be evaluated to unbounded, is intrinsic to the language because the semantics
of the UNION and OPTIONAL operators (i.e., they can generate unbounded
variables). Hence, we restrict our study by avoiding graph patterns of this type.</p>
          <p>Another issue concerns the use of correlated variables in the projection part
of a nested SELECT query. Consider the graph pattern</p>
          <p>((?X p ?Y ) FILTER ?Y = SOME (SELECT ?Y WHERE (?Z q ?Y ))):
Note that, the use of variable ?Y as a projected-variable in the nested SELECT
query, generates ambiguity about its scope. In fact, it is not clear whether ?Y
must be considered local to the inner query or it occurs as correlated with the
outer query. To minimize the possibility of confusion, the scope of a variable
will be interpreted using the nearest result query form possible (i.e., the nearest
SELECT). Hence, variable ?Y is local in the inner query of the example.
3</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Examples of nested queries in SPARQL</title>
      <p>Let G1; G2 be two RDF graphs identi ed by IRIs foaf and bib respectively.
G1 contains personal information using the FOAF vocabulary 7. G2 contains
bibliographic information using the bibTex Vocabulary 8. Consider the following
examples of nested queries.
7 http://xmlns.com/foaf/spec/
8 http://zeitkunst.org/bibtex/0.1/
Example 1. The oldest people.</p>
      <p>SELECT ?Per1 FROM foaf
WHERE ((?Per1 foaf:age ?Age1)</p>
      <p>FILTER (:(?Age1 &lt; SOME ( SELECT ?Age2 FROM foaf</p>
      <p>WHERE (?Per2 foaf:age ?Age2)))))
Example 2. The youngest people.</p>
      <p>SELECT ?Per1 FROM foaf
WHERE ((?Per1 foaf:age ?Age1)</p>
      <p>FILTER ( ?Age1 ALL ( SELECT ?Age2 FROM foaf</p>
      <p>WHERE (?Per2 foaf:age ?Age2))))
Example 3. Mails of people being part of at least one group.</p>
      <p>SELECT ?Mail FROM foaf
WHERE ((?Per foaf:mbox ?Mail)</p>
      <p>FILTER (?Per IN ( SELECT ?Mem FROM foaf</p>
      <p>WHERE (?Mem foaf:member ?Group))))
Example 4. Mails of people having at least one publication.</p>
      <p>SELECT ?Mail FROM foaf
WHERE ((?Per foaf:mbox ?Mail)</p>
      <p>FILTER ( EXISTS (ASK FROM bib</p>
      <p>WHERE (?Art bib:has-author ?Per)))))</p>
      <p>The above examples deserve several comments. IN expressions are less
expressive than SOME expressions because the former are restricted to equality of
values, whereas the latter allows all scalar comparison operators. Nested queries
with SOME = ALL operators without correlated variables are better for query
composition, i.e., simple and direct copy/paste of queries. The use of EXISTS is
not adequate for distributed queries because it needs correlated variables to make
sense. This helps the user to express complex queries but makes the evaluation
harder (because the application of the nested iteration method).
4</p>
    </sec>
    <sec id="sec-4">
      <title>Equivalences among nested queries</title>
      <p>In this section we present transformations among nested queries. We will show
that all types of nested queries can be simulated by lter conditions with the
EXISTS operator. Several equivalences presented in this section are well know
in SQL [3].
4.1</p>
      <sec id="sec-4-1">
        <title>Normalization</title>
        <p>In order to simplify the transformations, we will avoid complex lter constraints,
i.e., expressions of the form C1 ^ C2 and C1 _ C2 where C1 and C2 are lter
constraints. This assumption is supported by the following lemma.
Lemma 1. Every graph pattern having complex lter constraints can be
transformed in a graph pattern without complex lter constraints 9.</p>
        <p>Proof. Let P be a graph pattern and C, C1, C2 be lter constraints. The lemma
is supported by the following equivalences:
(P FILTER(C1 ^ C2))</p>
        <p>((P FILTER C1) FILTER C2)
(P FILTER(C1 _ C2))</p>
        <p>((P FILTER C1) UNION(P FILTER C2))
(P FILTER(:C))</p>
        <p>(P MINUS(P FILTER C))
Is not hard to see that the equivalences holds.
4.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Transformations</title>
        <p>
          Consider the following de nition of query equivalence.
(
          <xref ref-type="bibr" rid="ref1">1</xref>
          )
(
          <xref ref-type="bibr" rid="ref2">2</xref>
          )
(
          <xref ref-type="bibr" rid="ref3">3</xref>
          )
(
          <xref ref-type="bibr" rid="ref4">4</xref>
          )
(
          <xref ref-type="bibr" rid="ref5">5</xref>
          )
(
          <xref ref-type="bibr" rid="ref6">6</xref>
          )
(
          <xref ref-type="bibr" rid="ref7">7</xref>
          )
(
          <xref ref-type="bibr" rid="ref8">8</xref>
          )
        </p>
      </sec>
      <sec id="sec-4-3">
        <title>De nition 2 (Equivalence of queries). Two graph patterns P1 and P2 are</title>
        <p>equivalent, denoted P1 P2, if and only if JP1KGD = JP2KGD for every RDF
dataset D with active graph G. Additionally, given two queries Q = (R; F; P )
and Q0 = (R; F; P 0), we say that Q and Q0 are equivalent, denoted Q Q0, if
and only if P P 0.</p>
        <p>Next we will de ne transformations among several types of nested queries.
Based on transformations de ned in Section 4.1, we assume that queries do not
contain complex lter constraints.</p>
      </sec>
      <sec id="sec-4-4">
        <title>Proposition 1 (Transforming IN queries). Let P be a pattern of the form</title>
        <p>(P1 FILTER(u IN fQ2g)) where u 2 T . Then, P is equivalent to expression:
(P1 FILTER(u = SOMEfQ2g))
Proposition 2 (Transforming SOME queries). Let P be a pattern of the
form (P1 FILTER(u SOME(Q2))) where u 2 T , Q2 = (SELECT ?X2; F2; P2),
and ^ is the inverse operator to . Then, P is equivalent to the following
expressions:</p>
        <p>(P1 FILTER(:(u ^ ALL(Q2))))
(P1 FILTER EXISTS(ASK; F2; (P2 FILTER(u
?X2))))</p>
      </sec>
      <sec id="sec-4-5">
        <title>Proposition 3 (Transforming ALL queries). Let P be a pattern of the form</title>
        <p>(P1 FILTER(u ALL(Q2))) where u 2 T , Q2 = (SELECT ?X2; F2; P2), and ^
is the inverse operator to . Then, P is equivalent to the following expressions:
(P1 FILTER(:(u ^ SOME(Q2))))
(P1 FILTER(: EXISTS(ASK; F2; (P2 FILTER(u ^ ?X2)))))
9 Lemma 1 is true under set semantics. The inclusion of bag semantics, as de ned for
SPARQL, introduces complexity issues which are not discussed here.</p>
        <p>
          For example, the following queries show the application of transformations
(
          <xref ref-type="bibr" rid="ref7">7</xref>
          ) and (
          <xref ref-type="bibr" rid="ref8">8</xref>
          ) to the query of Example 2.
        </p>
        <p>Example 5. The youngest people (using the SOME operator).</p>
        <p>SELECT ?Per1 FROM foaf
WHERE ((?Per1 foaf:age ?Age1)</p>
        <p>FILTER ( :(?Age1 &gt; SOME (SELECT ?Age2 FROM foaf</p>
        <p>WHERE (?Per1 foaf:age ?Age2)))))
Example 6. The youngest people (using the EXISTS operator).</p>
        <p>SELECT ?Per1 FROM foaf
WHERE ((?Per1 foaf:age ?Age1)</p>
        <p>FILTER (: EXISTS (ASK FROM foaf</p>
        <p>WHERE ((?Per2 foaf:age ?Age2)</p>
        <p>FILTER (?Age1 &gt; ?Age2)))))))</p>
        <p>From the transformations de ned above we can present the following result.
Theorem 1. Nested queries using SOME, ALL and IN can be simulated by
using nested queries with the EXISTS operator.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusions</title>
      <p>We have studied how to extend SPARQL to support nesting along the design
philosophy of SQL. We showed that there is a simple syntax and semantics for
such extensions in SPARQL. We have shown that incorporating ASK queries
in FILTERS (through the EXISTS operator) gives the full power and exibility
of SQL nesting, allowing additionally to extend the semantics of SPARQL in
a clean and modular form. The proposal presented here permits a simple and
direct implementation of nested queries as known in the relational world (and
hence by known translation results) in the SPARQL world.</p>
      <p>Future work. An interesting problem studied in the database literature is
the e cient implementation of nested queries, where a well known approach is
the development of algorithms which transform nested queries into equivalent
non-nested queries which can be processed more e ciently by query-processing
subsystems [8,5]. On this line, most results are concentrated on aggregate
subqueries; optimization of non-aggregate subqueries has some limitations, specially
for queries with multiple subqueries and null values [2].</p>
      <p>Although decorrelation often results in cheaper non-nested plans,
decorrelation is not always applicable, and even if applicable may not be the best choice
in all situations since decorrelation carries a materialization overhead [15,6]. In
this direction, the issue of e cient methods of processing nested queries is one
of the main problems to be addressed in future works on this topic.
Acknowledgments. C. Gutierrez was supported by FONDECYT projects No.
1070348 and No. 1090565. The authors wish to thank the reviewers for their
comments.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>R.</given-names>
            <surname>Angles</surname>
          </string-name>
          and
          <string-name>
            <given-names>C.</given-names>
            <surname>Gutierrez</surname>
          </string-name>
          .
          <article-title>The Expressive Power of SPARQL</article-title>
          .
          <source>In Proceedings of the 7th International Semantic Web Conference (ISWC)</source>
          ,
          <source>number 5318 in LNCS</source>
          , pages
          <volume>114</volume>
          {
          <fpage>129</fpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>B.</given-names>
            <surname>Cao</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Badia</surname>
          </string-name>
          .
          <article-title>A nested relational approach to processing SQL subqueries</article-title>
          .
          <source>In Proc. of the 2005 ACM SIGMOD international conference on Management of data</source>
          , pages
          <volume>191</volume>
          {
          <fpage>202</fpage>
          , New York, NY, USA,
          <year>2005</year>
          . ACM Press.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>S.</given-names>
            <surname>Ceri</surname>
          </string-name>
          and
          <string-name>
            <given-names>G.</given-names>
            <surname>Gottlob</surname>
          </string-name>
          .
          <article-title>Translating SQL into relational algebra: optimization, semantics, and equivalence of SQL queries</article-title>
          .
          <source>IEEE Transactions on Software Engineering</source>
          ,
          <volume>11</volume>
          (
          <issue>4</issue>
          ):
          <volume>324</volume>
          {
          <fpage>345</fpage>
          ,
          <year>1985</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>R.</given-names>
            <surname>Cyganiak</surname>
          </string-name>
          .
          <article-title>A relational algebra for SPARQL</article-title>
          .
          <source>Technical Report HPL-2005-170</source>
          ,
          <string-name>
            <given-names>HP</given-names>
            <surname>Labs</surname>
          </string-name>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>R. A.</given-names>
            <surname>Ganski and H. K. T. Wong</surname>
          </string-name>
          .
          <article-title>Optimization of nested SQL queries revisited</article-title>
          .
          <source>In Proceedings of the 1987 ACM SIGMOD international conference on Management of data</source>
          , pages
          <volume>23</volume>
          {
          <fpage>33</fpage>
          , New York, NY, USA,
          <year>1987</year>
          . ACM Press.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>R.</given-names>
            <surname>Guravannavar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. S.</given-names>
            <surname>Ramanujam</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Sudarshan</surname>
          </string-name>
          .
          <article-title>Optimizing nested queries with parameter sort orders</article-title>
          .
          <source>In Proc. of the 31st Int. Conf. on Very large Data Bases (VLDB)</source>
          , pages
          <fpage>481</fpage>
          {
          <fpage>492</fpage>
          . VLDB Endowment,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>S.</given-names>
            <surname>Harris</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Seaborne</surname>
          </string-name>
          .
          <source>SPARQL 1</source>
          .1 Query. W3C Working Draft. http://www.w3.org/TR/2009/WD-sparql11
          <string-name>
            <surname>-</surname>
          </string-name>
          query-20091022/, October 22
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>W.</given-names>
            <surname>Kim</surname>
          </string-name>
          .
          <article-title>On optimizing an SQL-like nested query</article-title>
          .
          <source>ACM Transactions on Database Systems (TODS)</source>
          ,
          <volume>7</volume>
          (
          <issue>3</issue>
          ):
          <volume>443</volume>
          {
          <fpage>469</fpage>
          ,
          <year>1982</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>K.</given-names>
            <surname>Kjernsmo</surname>
          </string-name>
          and
          <string-name>
            <surname>A. Passant.</surname>
          </string-name>
          <article-title>SPARQL New Features and Rationale</article-title>
          . W3C Working Draft. http://www.w3.org/TR/2009/WD-sparql-features-
          <volume>20090702</volume>
          /, July 2
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10. G. Klyne and
          <string-name>
            <given-names>J.</given-names>
            <surname>Carroll</surname>
          </string-name>
          .
          <article-title>Resource Description Framework (RDF) Concepts and Abstract Syntax</article-title>
          . http://www.w3.org/TR/2004/REC-115
          <string-name>
            <surname>-</surname>
          </string-name>
          concepts-20040210/,
          <year>February 2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>J. Perez</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Arenas</surname>
            , and
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Gutierrez</surname>
          </string-name>
          .
          <article-title>Semantics and Complexity of SPARQL</article-title>
          .
          <source>In Proceedings of the 5th International Semantic Web Conference (ISWC)</source>
          ,
          <source>number 4273 in LNCS</source>
          , pages
          <volume>30</volume>
          {
          <fpage>43</fpage>
          . Springer-Verlag,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>A.</given-names>
            <surname>Polleres</surname>
          </string-name>
          .
          <article-title>From SPARQL to Rules (and back)</article-title>
          .
          <source>In Proceedings of the 16th International World Wide Web Conference (WWW)</source>
          , pages
          <fpage>787</fpage>
          {
          <fpage>796</fpage>
          . ACM,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13. E.
          <string-name>
            <surname>Prud</surname>
          </string-name>
          <article-title>'hommeaux and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Seaborne. SPARQL Query</surname>
          </string-name>
          <article-title>Language for RDF. W3C Recommendation 15 January</article-title>
          . http://www.w3.org/TR/2008/REC-115
          <string-name>
            <surname>-</surname>
          </string-name>
          sparqlquery-20080115/,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>S.</given-names>
            <surname>Schenk. A SPARQL Semantics</surname>
          </string-name>
          <article-title>Based on Datalog</article-title>
          .
          <source>In 30th Annual German Conference on Advances in Arti cial Intelligence (KI)</source>
          , volume
          <volume>4667</volume>
          <source>of LNCS</source>
          , pages
          <volume>160</volume>
          {
          <fpage>174</fpage>
          . Springer,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <given-names>P.</given-names>
            <surname>Seshadri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Pirahesh</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T. Y. C.</given-names>
            <surname>Leung</surname>
          </string-name>
          .
          <article-title>Complex query decorrelation</article-title>
          .
          <source>In Proc. of the 12th Int. Conf. on Data Engineering (ICDE)</source>
          , pages
          <fpage>450</fpage>
          {
          <fpage>458</fpage>
          . IEEE Computer Society,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <given-names>P.</given-names>
            <surname>Weinberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gro</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <surname>A. Oppel. SQL</surname>
          </string-name>
          ,
          <article-title>The Complete Reference</article-title>
          .
          <string-name>
            <surname>McGraw-Hill</surname>
          </string-name>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>